参考文献
[1] Johnson, Sam and Mark Everingham. Clustered Pose and Nonlinear Appearance Models for Human Pose Estimation[C]. British Machine Vision Conference, 2010. DOI:10.5244/C.24.12 [2] Sapp, Benjamin and Ben Taskar. MODEC: Multimodal Decomposable Models for Human Pose Estimation[C]. 2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013: 3674-3681. DOI:10.1109/CVPR.2013.471 [3] Andriluka, Mykhaylo, et al. 2D Human Pose Estimation: New Benchmark and State of the Art Analysis[C]. 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014: 3686-3693. DOI:10.1109/CVPR.2014.471 [4] Lin Tsung-Yi, et al. Microsoft COCO: Common Objects in Context[C]. European Conference on Computer Vision, 2014. DOI:10.1007/978-3-319-10602-1_48 [5] Li, Jiefeng, et al. CrowdPose: Efficient Crowded Scenes Pose Estimation and a New Benchmark[C]. 2019 IEEE/ CVF Conference on Computer Vision and Pattern Recognition( CVPR), 2018: 10855-10864. DOI:10.1109/CVPR.2019.01112 [6] Zhang, Weiyu, et al. From Actemes to Action: A Strongly- Supervised Representation for Detailed Action Understanding[C]. 2013 IEEE International Conference on Computer Vision, 2013: 2248-2255. DOI:10.1109/ICCV.2013.280 [7] Jhuang, Hueihan, et al. Towards Understanding Action Recognition[C]. 2013 IEEE International Conference on Computer Vision, 2013: 3192-3199. DOI:10.1109/ICCV.2013.396 [8] Iqbal, Umar, et al. Pose for Action-Action for Pose[C]. 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition(FG 2017), 2016: 438-445. DOI:10.1109/FG.2017.61 [9] Andriluka, Mykhaylo, et al. PoseTrack: A Benchmark for Human Pose Estimation and Tracking[C]. 2018 IEEE/ CVF Conference on Computer Vision and Pattern Recognition, 2017: 5167-5176. DOI:10.1109/CVPR.2018.00542 [10] Lin, Weiyao, et al. Human in Events: A Large-Scale Benchmark for Human-centric Video Analysis in Complex Events. ArXiv abs/2005. 04490, 2020: n. pag. DOI:10.48550/arXiv.2005.04490 [11] Sigal, Leonid, et al. HumanEva: Synchronized Video and Motion Capture Dataset and Baseline Algorithm for Evaluation of Articulated Human Motion[J]. International Journal of Computer Vision, 2010, 87: 4-27. DOI:10.1007/s11263-009-0273-6 [12] Ionescu, Catalin, et al. Human3. 6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments[C]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36: 1325-1339. DOI:10.1109/TPAMI.2013.248 [13] Joo, Hanbyul, et al. Panoptic Studio: A Massively Multiview System for Social Motion Capture[C]. 2015 IEEE International Conference on Computer Vision(ICCV), 2015: 3334-3342. DOI:10.1109/ICCV.2015.381 [14] Mehta, Dushyant, et al. Monocular 3D Human Pose Estimation in the Wild Using Improved CNN Supervision[C]. 2017 International Conference on 3D Vision(3DV), 2016: 506-516. DOI:10.1109/3DV.2017.00064 [15] Varol, Gül, et al. Learning from Synthetic Humans[C]. 2017 IEEE Conference on Computer Vision and Pattern Recognition(CVPR), 2017: 4627-4635. DOI:10.1109/CVPR.2017.492 [16] Fabbri, Matteo, et al. Learning to Detect and Track Visible and Occluded Body Joints in a Virtual World[C]. European Conference on Computer Vision, 2018. DOI:10.1007/978-3-030-01225-0_27 [17] Marcard, Timo von, et al. Recovering Accurate 3D Human Pose in the Wild Using IMUs and a Moving Camera[C]. European Conference on Computer Vision, 2018. DOI:10.1007/978-3-030-01249-6_37 [18] Toshev, Alexander and Christian Szegedy. DeepPose: Human Pose Estimation via Deep Neural Networks[C]. 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2013: 1653-1660. DOI:10.1109/CVPR.2014.214 [19] Sun, Xiao, et al. Compositional Human Pose Regression[C]. 2017 IEEE International Conference on Computer Vision(ICCV), 2017: 2621-2630. DOI:10.1109/ICCV.2017.284 [20] Kipf, Thomas and Max Welling. Semi-Supervised Classification with Graph Convolutional Networks. ArXiv abs/1609. 02907, 2016: n. pag. DOI:10.48550/arXiv.1609.02907 [21] Qiu, Lingteng, et al. Peeking into occluded joints: A novel framework for crowd pose estimation. ArXiv abs/ 2003. 10506, 2020: n. pag. DOI:10.1007/978-3-030-58529-7_29 [22] Vaswani, Ashish, et al. Attention is All you Need. NIPS, 2017. DOI:10.48550/arXiv.1706.03762 [23] Li, Ke, et al. Pose Recognition with Cascade Transformers[C]. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), 2021: 1944-1953. DOI:10.1109/CVPR46437.2021.00198 [24] Ramakrishna, Varun, et al. Pose Machines: Articulated Pose Estimation via Inference Machines[C]. European Conference on Computer Vision, 2014. DOI:10.1007/978-3-319-10605-2_3 [25] Wei, Shih-En, et al. Convolutional Pose Machines[C]. 2016 IEEE Conference on Computer Vision and Pattern Recognition(CVPR), 2016: 4724-4732. DOI:10.1109/CVPR.2016.511 [26] He, Kaiming, et al. Deep Residual Learning for Image Recognition[C]. 2016 IEEE Conference on Computer Vision and Pattern Recognition(CVPR), 2015: 770-778. DOI:10.1109/cvpr.2016.90 [27] Chen, Yilun, et al. Cascaded Pyramid Network for Multi-person Pose Estimation[C]. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2017: 7103-7112. DOI:10.1109/CVPR.2018.00742 [28] Newell, Alejandro, et al. Stacked Hourglass Networks for Human Pose Estimation[C]. European Conference on Computer Vision, 2016. DOI:10.1007/978-3-319-46484-8_29 [29] Chu, Xiao, et al. Multi-context Attention for Human Pose Estimation[C]. 2017 IEEE Conference on Computer Vision and Pattern Recognition(CVPR), 2017: 5669-5678. DOI:10.1109/CVPR.2017.601 [30] Ke, Lipeng, et al. Multi-Scale Structure-Aware Network for Human Pose Estimation[C]. European Conference on Computer Vision, 2018. DOI:10.1007/978-3-030-01216-8_44 [31] Tang, Weixian and Ying Wu. Does Learning Specific Features for Related Parts Help Human Pose Estimation?”[C]. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), 2019: 1107-1116. DOI:10.1109/CVPR.2019.00120 [32] Sun, Ke, et al. Deep High-Resolution Representation Learning for Human Pose Estimation[C]. 2019 IEEE/ CVF Conference on Computer Vision and Pattern Recognition( CVPR), 2019: 5686-5696. DOI:10.1109/CVPR.2019.00584 [33] Liu, Zhenguang, et al. Deep Dual Consecutive Network for Human Pose Estimation[C]. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), 2021: 525-534. DOI:10.1109/CVPR46437.2021.00059 [34] Liu, Huajun, et al. Polarized Self-Attention: Towards High-quality Pixel-wise Regression. ArXiv abs/2107. 00782, 2021: n. pag. DOI:10.1016/j.neucom.2022.07.054 [35] Xu, Yufei, et al. ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation. ArXiv abs/ 2204. 12484, 2022: n. pag. DOI:10.48550/arXiv.2204.12484 [36] Cao, Zhe, et al. Realtime Multi-person 2D Pose Estimation Using Part Affinity Fields[C]. 2017 IEEE Conference on Computer Vision and Pattern Recognition(CVPR), 2016: 1302-1310. DOI:10.1109/CVPR.2017.143 [37] Cheng, Bowen, et al. HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation[C]. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), 2019: 5385-5394. DOI:10.1109/cvpr42600.2020.00543 [38] Luo, Zhengxiong, et al. Rethinking the Heatmap Regression for Bottom-up Human Pose Estimation[C]. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), 2020: 13259-13268. DOI:10.1109/CVPR46437.2021.01306 [39] Jin, Sheng, et al. Differentiable Hierarchical Graph Grouping for Multi-Person Pose Estimation. ArXiv abs/2007. 11864, 2020: n. pag. DOI:10.1007/978-3-030-58571-6_42 [40] Wang, Dongkai, et al. Robust Pose Estimation in Crowded Scenes with Direct Pose-Level Inference[R]. Neural Information Processing Systems, 2021. DOI:10.24963/ijcai.2021/5271 [41] Geng, Zigang, et al. Bottom-Up Human Pose Estimation Via Disentangled Keypoint Regression[C]. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), 2021: 14671-14681. DOI:10.1109/CVPR46437.2021.01444 [42] Bras'o, Guillem, et al. The Center of Attention: Center- Keypoint Grouping via Attention for Multi-Person Pose Estimation[C]. 2021 IEEE/CVF International Conference on Computer Vision(ICCV), 2021: 11833-11843. DOI:10.1109/ICCV48922.2021.01164 [43] Luvizon, Diogo Carbonera, et al. 2D/3D Pose Estimation and Action Recognition Using Multitask Deep Learning[C]. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018: 5137-5146. DOI:10.1109/CVPR.2018.00539 [44] Pavlakos, Georgios, et al. Coarse-to-Fine Volumetric Prediction for Single-Image 3D Human Pose[C]. 2017 IEEE Conference on Computer Vision and Pattern Recognition(CVPR), 2016: 1263-1272. DOI:10.1109/CVPR.2017.139 [45] Zhou, Kun, et al. HEMlets Pose: Learning Part-Centric Heatmap Triplets for Accurate 3D Human Pose Estimation[C]. 2019 IEEE/CVF International Conference on Computer Vision(ICCV), 2019: 2344-2353. DOI:10.1109/ICCV.2019.00243 [46] Dabral, Rishabh, et al. Learning 3D Human Pose from Structure and Motion[C]. European Conference on Computer Vision, 2017. DOI:10.1007/978-3-030-01240-3_41 [47] Sun, Xiao, et al. Compositional Human Pose Regression[C]. 2017 IEEE International Conference on Computer Vision(ICCV), 2017: 2621-2630. DOI:10.1109/ICCV.2017.284 [48] Martinez, Julieta, et al. A Simple Yet Effective Baseline for 3d Human Pose Estimation[C]. 2017 IEEE International Conference on Computer Vision(ICCV), 2017: 2659-2668. DOI:10.1109/ICCV.2017.288 [49] Pavllo, Dario, et al. 3D Human Pose Estimation in Video With Temporal Convolutions and Semi-Supervised Training[C]. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), 2018: 7745-7754. DOI:10.1109/CVPR.2019.00794 [50] Zeng, Ailing, et al. SRNet: Improving Generalization in 3D Human Pose Estimation with a Split-and-Recombine Approach. ArXiv abs/2007. 09389, 2020: n. pag. DOI:10.1007/978-3-030-58568-6_30 [51] Chen, Tianlang, et al. Anatomy-Aware 3D Human Pose Estimation With Bone-Based Pose Decomposition[C]. IEEE Transactions on Circuits and Systems for Video Technology 32, 2021: 198-209. DOI:10.1109/TCSVT.2021.3057267 [52] Zhan, Yu-Wei, et al. Ray3D: ray-based 3D human pose estimation for monocular absolute 3D localization[C]. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), 2022: 13106-13115. DOI:10.1109/CVPR52688.2022.01277 [53] Zhao, Long, et al. Semantic Graph Convolutional Networks for 3D Human Pose Regression[C]. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), 2019: 3420-3430. DOI:10.1109/CVPR.2019.00354 [54] Cai, Yujun, et al. Exploiting Spatial-Temporal Relationships for 3D Pose Estimation via Graph Convolutional Networks[C]. 2019 IEEE/CVF International Conference on Computer Vision(ICCV), 2019: 2272-2281. DOI:10.1109/ICCV.2019.00236 [55] Zeng, Ailing, et al. Learning Skeletal Graph Neural Networks for Hard 3D Pose Estimation[C]. 2021 IEEE/ CVF International Conference on Computer Vision(ICCV), 2021: 11416-11425. DOI:10.1109/ICCV48922.2021.01124 [56] Zheng, Ce, et al. 3D Human Pose Estimation with Spatial and Temporal Transformers[C]. 2021 IEEE/CVF International Conference on Computer Vision(ICCV), 2021: 11636-11645. DOI:10.1109/ICCV48922.2021.01145 [57] Li, Wenhao, et al. MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation[C]. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), 2021: 13137-13146. DOI:10.1109/CVPR52688.2022.01280 [58] Zhang, Jinlu, et al. MixSTE: Seq2seq Mixed Spatio- Temporal Encoder for 3D Human Pose Estimation in Video[C]. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), 2022: 13222-13232. DOI:10.1109/CVPR52688.2022.01288 [59] Zhu, Wenjie, et al. MotionBERT: Unified Pretraining for Human Motion Analysis. ArXiv abs/2210. 06551, 2022: n. pag. DOI:10.48550/arXiv.2210.06551 [60] Zhang, Zhengyou. Microsoft Kinect Sensor and Its Effect[J]. IEEE Multim, 2012, 19: 4-10. DOI:10.1109/MMUL.2012.24 [61] 唐心宇, 宋爱国. 人体姿态估计及在康复训练情景交互中的应用[J]. 仪器仪表学报, 2018, 39(11): 195-203. DOI:10.19650/j.cnki.cjsi.J1803879 [62] Xiao, Bin, Haiping Wu, and Yichen Wei. Simple baselines for human pose estimation and tracking[C]. Proceedings of the European conference on computer vision( ECCV). 2018. DOI:10.1007/978-3-030-01231-1_29 [63] Li, Yanjie, et al. Tokenpose: Learning keypoint tokens for human pose estimation[C]. Proceedings of the IEEE/CVF International conference on computer vision. 2021. DOI:10.1109/ICCV48922.2021.01112 [64] Li, Wenbo, et al. Rethinking on multi-stage networks for human pose estimation. arXiv preprint arXiv: 1901. 00148, 2019. DOI:10.1109/TPAMI.2019.2958916 [65] Zhang, Feng, et al. Distribution-aware coordinate representation for human pose estimation[C]. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020. DOI:10.1109/cvpr42600.2020.00712 [66] Geng, Zigang, et al. Human Pose as Compositional Tokens[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023. DOI:10.1109/CVPR52729.2023.00071 [67] Liu, Ze, et al. Swin transformer v2: Scaling up capacity and resolution[C]. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022. DOI:10.1109/CVPR52688.2022.01170 [68] Liu, Huajun, et al. Polarized self-attention: Towards high-quality pixel-wise regression. arXiv preprint arXiv: 2107. 00782, 2021. DOI:10.48550/arXiv.2107.00782 [69] Zhang, Jing, Zhe Chen, and Dacheng Tao. Towards high performance human keypoint detection[J]. International Journal of Computer Vision 129. 9, 2021: 2639-2662. DOI:10.1007/s11263-021-01482-8 [70] Zhang, Feng, et al. Distribution-Aware Coordinate Representation for Human Pose Estimation[C]. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), 2019: 7091-7100. DOI:10.1109/cvpr42600.2020.00712 [71] Xu, Yufei, et al. Vitpose: Simple vision transformer baselines for human pose estimation. arXiv preprint arXiv: 2204. 12484, 2022. DOI:10.48550/arXiv.2204.12484 [72] Dosovitskiy, Alexey, et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv: 2010. 11929, 2020. DOI:10.48550/arXiv.2010.11929 [73] He, Kaiming, et al. Mask r-cnn. Proceedings of the IEEE international conference on computer vision. 2017. DOI:10.1109/ICCV.2017.322 [74] Papandreou, George, et al. Personlab: Person pose estimation and instance segmentation with a bottom-up, part-based, geometric embedding model[C]. Proceedings of the European conference on computer vision(ECCV). 2018. DOI:10.1007/978-3-030-01264-9_17 [75] Yuan, Yuhui, et al. Hrformer: High-resolution transformer for dense prediction. arXiv preprint arXiv: 2110. 09408, 2021. DOI:10.1109/CVPR.2021.01300 [76] McNally, William, et al. Rethinking keypoint representations: Modeling keypoints and poses as objects for multi-person human pose estimation[C]. Computer Vision- ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23-27, 2022, Proceedings, Part VI. Cham: Springer Nature Switzerland, 2022. DOI:10.1007/978-3-031-20068-7_3 [77] Li, Jiefeng, et al. Human pose regression with residual loglikelihood estimation[C]. Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021. DOI:10.1109/ICCV48922.2021.01084 [78] Shan, Wenkang, et al. Diffusion-Based 3D Human Pose Estimation with Multi-Hypothesis Aggregation. ArXiv abs/2303. 11579, 2023: n. pag. DOI:10.48550/arXiv.2303.11579 [79] Loper, Matthew, et al. SMPL: A skinned multi-person linear model[J]. ACM transactions on graphics(TOG)34. 6, 2015: 1-16. DOI:10.1145/3596711.3596800 [80] Li, Yanjie, et al. SimCC: A Simple Coordinate Classification Perspective for Human Pose Estimation[C]. European Conference on Computer Vision, 2021. DOI:10.1007/978-3-031-20068-7_6 [81] Yang, Sen, et al. TransPose: Keypoint Localization via Transformer[C]. 2021 IEEE/CVF International Conference on Computer Vision(ICCV), 2020: 11782-11792. DOI:10.1109/ICCV48922.2021.01159 [82] Liu, Ruixu, et al. Attention Mechanism Exploits Temporal Contexts: Real-Time 3D Human Pose Reconstruction[C]. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), 2020: 5063-5072. DOI:10.1109/cvpr42600.2020.00511