Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
166 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Survey on 3D Egocentric Human Pose Estimation (2403.17893v2)

Published 26 Mar 2024 in cs.CV

Abstract: Egocentric human pose estimation aims to estimate human body poses and develop body representations from a first-person camera perspective. It has gained vast popularity in recent years because of its wide range of applications in sectors like XR-technologies, human-computer interaction, and fitness tracking. However, to the best of our knowledge, there is no systematic literature review based on the proposed solutions regarding egocentric 3D human pose estimation. To that end, the aim of this survey paper is to provide an extensive overview of the current state of egocentric pose estimation research. In this paper, we categorize and discuss the popular datasets and the different pose estimation models, highlighting the strengths and weaknesses of different methods by comparative analysis. This survey can be a valuable resource for both researchers and practitioners in the field, offering insights into key concepts and cutting-edge solutions in egocentric pose estimation, its wide-ranging applications, as well as the open problems with future scope.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (80)
  1. A. Toshev and C. Szegedy, “Deeppose: Human pose estimation via deep neural networks,” in 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014.
  2. Z. Cao, G. Hidalgo, T. Simon, S.-E. Wei, and Y. Sheikh, “Openpose: Realtime multi-person 2d pose estimation using part affinity fields,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021.
  3. K. Sun, B. Xiao, D. Liu, and J. Wang, “Deep high-resolution representation learning for human pose estimation,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019.
  4. R. A. Güler, N. Neverova, and I. Kokkinos, “Densepose: Dense human pose estimation in the wild,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7297–7306, 2018.
  5. X. Ji, Q. Fang, J. Dong, Q. Shuai, W. Jiang, and X. Zhou, “A survey on monocular 3d human pose estimation,” Virtual Reality & Intelligent Hardware, vol. 2, pp. 471–500, 12 2020.
  6. D. Tome, T. Alldieck, P. Peluse, G. Pons-Moll, L. Agapito, H. Badino, and F. de la Torre, “Selfpose: 3d egocentric pose estimation from a headset mounted camera,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 6, pp. 6794–6806, 2023.
  7. C. Zheng, W. Wu, C. Chen, T. Yang, S. Zhu, J. Shen, N. Kehtarnavaz, and M. Shah, “Deep learning-based human pose estimation: A survey,” ACM Computing Surveys, vol. 56, no. 1, pp. 1–37, 2023.
  8. J. Wang, S. Tan, X. Zhen, S. Xu, F. Zheng, Z. He, and L. Shao, “Deep 3d human pose estimation: A review,” Computer Vision and Image Understanding, vol. 210, p. 103225, 2021.
  9. T. L. Munea, Y. Z. Jembre, H. T. Weldegebriel, L. Chen, C. Huang, and C. Yang, “The progress of human pose estimation: A survey and taxonomy of models applied in 2d human pose estimation,” IEEE Access, vol. 8, pp. 133330–133348, 2020.
  10. M. B. Gamra and M. A. Akhloufi, “A review of deep learning techniques for 2d and 3d human pose estimation,” Image and Vision Computing, vol. 114, p. 104282, 2021.
  11. Y. Liu, C. Qiu, and Z. Zhang, “Deep learning for 3d human pose estimation and mesh recovery: A survey,” arXiv preprint arXiv:2402.18844, 2024.
  12. Y. Tian, H. Zhang, Y. Liu, and L. Wang, “Recovering 3d human mesh from monocular images: A survey,” IEEE transactions on pattern analysis and machine intelligence, 2023.
  13. A. Bandini and J. Zariffa, “Analysis of the hands in egocentric vision: A survey,” IEEE transactions on pattern analysis and machine intelligence, 2020.
  14. A. Núñez-Marcos, G. Azkune, and I. Arganda-Carreras, “Egocentric vision-based action recognition: A survey,” Neurocomputing, vol. 472, pp. 175–197, 2022.
  15. M. Andriluka, L. Pishchulin, P. Gehler, and B. Schiele, “2d human pose estimation: New benchmark and state of the art analysis,” in Proceedings of the IEEE Conference on computer Vision and Pattern Recognition, pp. 3686–3693, 2014.
  16. C. Ionescu, D. Papava, V. Olaru, and C. Sminchisescu, “Human3. 6m: Large scale datasets and predictive methods for 3d human sensing in natural environments,” IEEE transactions on pattern analysis and machine intelligence, vol. 36, no. 7, pp. 1325–1339, 2013.
  17. H. Akada, J. Wang, S. Shimada, M. Takahashi, C. Theobalt, and V. Golyanik, “Unrealego: A new dataset for robust egocentric 3d human motion capture,” in European Conference on Computer Vision, pp. 1–17, Springer, 2022.
  18. J. Wang, L. Liu, W. Xu, K. Sarkar, D. Luvizon, and C. Theobalt, “Estimating egocentric 3d human pose in the wild with external weak supervision,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13157–13166, 2022.
  19. J. Wang, D. Luvizon, W. Xu, L. Liu, K. Sarkar, and C. Theobalt, “Scene-aware egocentric 3d human pose estimation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13031–13040, 2023.
  20. J. Wang, L. Liu, W. Xu, K. Sarkar, and C. Theobalt, “Estimating egocentric 3d human pose in global space,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 11500–11509, 2021.
  21. H. Rhodin, C. Richardt, D. Casas, E. Insafutdinov, M. Shafiei, H.-P. Seidel, B. Schiele, and C. Theobalt, “Egocap: egocentric marker-less motion capture with two fisheye cameras,” ACM Transactions on Graphics (TOG), vol. 35, no. 6, pp. 1–11, 2016.
  22. W. Xu, A. Chatterjee, M. Zollhoefer, H. Rhodin, P. Fua, H.-P. Seidel, and C. Theobalt, “Mo 2 cap 2: Real-time mobile 3d motion capture with a cap-mounted fisheye camera,” IEEE transactions on visualization and computer graphics, vol. 25, no. 5, pp. 2093–2101, 2019.
  23. G. Varol, J. Romero, X. Martin, N. Mahmood, M. J. Black, I. Laptev, and C. Schmid, “Learning from synthetic humans,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 109–117, 2017.
  24. D. Tome, P. Peluse, L. Agapito, and H. Badino, “xr-egopose: Egocentric 3d human pose from an hmd camera,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 7728–7738, 2019.
  25. D. Zhao, Z. Wei, J. Mahmud, and J.-M. Frahm, “Egoglass: Egocentric-view human pose estimation from an eyeglass frame,” in 2021 International Conference on 3D Vision (3DV), pp. 32–41, IEEE, 2021.
  26. S. Zhang, Q. Ma, Y. Zhang, Z. Qian, T. Kwon, M. Pollefeys, F. Bogo, and S. Tang, “Egobody: Human body shape and motion of interacting people from head-mounted devices,” in European Conference on Computer Vision, pp. 180–200, Springer, 2022.
  27. Y. Liu, J. Yang, X. Gu, Y. Chen, Y. Guo, and G.-Z. Yang, “Egofish3d: Egocentric 3d pose estimation from a fisheye camera via self-supervised learning,” IEEE Transactions on Multimedia, vol. 25, pp. 8880–8891, 2023.
  28. A. Dhamanaskar, M. Dimiccoli, E. Corona, A. Pumarola, and F. Moreno-Noguer, “Enhancing egocentric 3d pose estimation with third person views,” Pattern Recognition, vol. 138, p. 109358, 2023.
  29. K. Grauman, A. Westbury, L. Torresani, K. Kitani, J. Malik, T. Afouras, K. Ashutosh, V. Baiyya, S. Bansal, B. Boote, et al., “Ego-exo4d: Understanding skilled human activity from first-and third-person perspectives,” arXiv preprint arXiv:2311.18259, 2023.
  30. H. Akada, J. Wang, V. Golyanik, and C. Theobalt, “3d human pose perception from egocentric stereo videos,” 2023.
  31. G. Pavlakos, V. Choutas, N. Ghorbani, T. Bolkart, A. A. Osman, D. Tzionas, and M. J. Black, “Expressive body capture: 3d hands, face, and body from a single image,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 10975–10985, 2019.
  32. S. Garrido-Jurado, R. Muñoz-Salinas, F. J. Madrid-Cuevas, and M. J. Marín-Jiménez, “Automatic generation and detection of highly reliable fiducial markers under occlusion,” Pattern Recognition, vol. 47, no. 6, pp. 2280–2292, 2014.
  33. G. Pavlakos, X. Zhou, and K. Daniilidis, “Ordinal depth supervision for 3d human pose estimation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7307–7316, 2018.
  34. X. Sun, J. Shang, S. Liang, and Y. Wei, “Compositional human pose regression,” in Proceedings of the IEEE international conference on computer vision, pp. 2602–2611, 2017.
  35. B. Tekin, I. Katircioglu, M. Salzmann, V. Lepetit, and P. Fua, “Structured prediction of 3d human pose with deep neural networks,” arXiv preprint arXiv:1605.05180, 2016.
  36. F. Bogo, A. Kanazawa, C. Lassner, P. Gehler, J. Romero, and M. J. Black, “Keep it smpl: Automatic estimation of 3d human pose and shape from a single image,” in Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part V 14, pp. 561–578, Springer, 2016.
  37. A. Zanfir, E. Marinoiu, and C. Sminchisescu, “Monocular 3d pose and shape estimation of multiple people in natural scenes-the importance of multiple scene constraints,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2148–2157, 2018.
  38. Y. Huang, F. Bogo, C. Lassner, A. Kanazawa, P. V. Gehler, J. Romero, I. Akhter, and M. J. Black, “Towards accurate marker-less human shape and pose estimation over time,” in 2017 international conference on 3D vision (3DV), pp. 421–430, IEEE, 2017.
  39. M. Loper, N. Mahmood, J. Romero, G. Pons-Moll, and M. J. Black, “Smpl: A skinned multi-person linear model,” in Seminal Graphics Papers: Pushing the Boundaries, Volume 2, pp. 851–866, 2023.
  40. T. Yu, Z. Zheng, K. Guo, J. Zhao, Q. Dai, H. Li, G. Pons-Moll, and Y. Liu, “Doublefusion: Real-time capture of human performances with inner body shapes from a single depth sensor,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7287–7296, 2018.
  41. Z. Zheng, T. Yu, Y. Liu, and Q. Dai, “Pamir: Parametric model-conditioned implicit representation for image-based human reconstruction,” IEEE transactions on pattern analysis and machine intelligence, vol. 44, no. 6, pp. 3170–3184, 2021.
  42. T. Hu, T. Yu, Z. Zheng, H. Zhang, Y. Liu, and M. Zwicker, “Hvtr: Hybrid volumetric-textural rendering for human avatars,” in 2022 International Conference on 3D Vision (3DV), pp. 197–208, IEEE, 2022.
  43. Z. Huang, Y. Xu, C. Lassner, H. Li, and T. Tung, “Arch: Animatable reconstruction of clothed humans,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3093–3102, 2020.
  44. Z. Zheng, X. Zhao, H. Zhang, B. Liu, and Y. Liu, “Avatarrex: Real-time expressive full-body avatars,” ACM Transactions on Graphics (TOG), vol. 42, no. 4, pp. 1–19, 2023.
  45. J. Hwang and J. Kang, “Double discrete representation for 3d human pose estimation from head-mounted camera,” in 2024 IEEE International Conference on Consumer Electronics (ICCE), pp. 1–4, IEEE, 2024.
  46. H. Jiang and K. Grauman, “Seeing invisible poses: Estimating 3d body pose from egocentric video,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3501–3509, IEEE, 2017.
  47. E. Ng, D. Xiang, H. Joo, and K. Grauman, “You2me: Inferring body pose in egocentric video via first and second person interactions,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9890–9900, 2020.
  48. Y. Zhang, S. You, and T. Gevers, “Automatic calibration of the fisheye camera for egocentric 3d human pose estimation from a single image,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 1772–1781, 2021.
  49. Y. Liu, J. Yang, X. Gu, Y. Guo, and G.-Z. Yang, “Ego+ x: An egocentric vision system for global 3d human pose estimation and social interaction characterization,” in 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 5271–5277, IEEE, 2022.
  50. T. Kang, K. Lee, J. Zhang, and Y. Lee, “Ego3dpose: Capturing 3d cues from binocular egocentric views,” in SIGGRAPH Asia 2023 Conference Papers, pp. 1–10, 2023.
  51. J. Park, K. Kaai, S. Hossain, N. Sumi, S. Rambhatla, and P. Fieguth, “Domain-guided spatio-temporal self-attention for egocentric 3d pose estimation,” in Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 1837–1849, 2023.
  52. T. Li, C. Zhang, W. Su, and Y. Liu, “Egoformer: Transformer-based motion context learning for ego-pose estimation,” in 2023 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 4052–4057, IEEE, 2023.
  53. “CMU Motion Capture Database.” http://mocap.cs.cmu.edu. Accessed: March 2, 2024.
  54. H. Jiang and V. K. Ithapu, “Egocentric pose estimation from human vision span,” in 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 10986–10994, IEEE, 2021.
  55. T. Kang and Y. Lee, “Attention-propagation network for egocentric heatmap to 3d pose lifting,” arXiv preprint arXiv:2402.18330, 2024.
  56. Y. Yuan and K. Kitani, “Ego-pose estimation and forecasting as real-time pd control,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10082–10092, 2019.
  57. Y. Yuan and K. Kitani, “3d ego-pose estimation via imitation learning,” in Proceedings of the European Conference on Computer Vision (ECCV), pp. 735–750, 2018.
  58. A. Dittadi, S. Dziadzio, D. Cosker, B. Lundell, T. J. Cashman, and J. Shotton, “Full-body motion from a single head-mounted device: Generating smpl poses from partial observations,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 11687–11697, 2021.
  59. K. Ahuja, E. Ofek, M. Gonzalez-Franco, C. Holz, and A. D. Wilson, “Coolmoves: User motion accentuation in virtual reality,” Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, vol. 5, no. 2, pp. 1–23, 2021.
  60. T. Hu, K. Sarkar, L. Liu, M. Zwicker, and C. Theobalt, “Egorenderer: Rendering human avatars from egocentric camera images,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 14528–14538, 2021.
  61. V. Guzov, A. Mir, T. Sattler, and G. Pons-Moll, “Human poseitioning system (hps): 3d human pose estimation and self-localization in large scenes from body-mounted sensors,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4318–4329, 2021.
  62. J. Jiang, P. Streli, H. Qiu, A. Fender, L. Laich, P. Snape, and C. Holz, “Avatarposer: Articulated full-body pose tracking from sparse motion sensing,” in European Conference on Computer Vision, pp. 443–460, Springer, 2022.
  63. S. Aliakbarian, P. Cameron, F. Bogo, A. Fitzgibbon, and T. J. Cashman, “Flag: Flow-based 3d avatar generation from sparse observations,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13253–13262, 2022.
  64. J. Li, K. Liu, and J. Wu, “Ego-body pose estimation via ego-head pose estimation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 17142–17151, 2023.
  65. S. Zhang, Q. Ma, Y. Zhang, S. Aliakbarian, D. Cosker, and S. Tang, “Probabilistic human mesh recovery in 3d scenes from egocentric views,” arXiv preprint arXiv:2304.06024, 2023.
  66. J. Jiang, P. Streli, M. Meier, and C. Holz, “Egoposer: Robust real-time ego-body pose estimation in large scenes,” 2023.
  67. N. Mahmood, N. Ghorbani, N. F. Troje, G. Pons-Moll, and M. J. Black, “Amass: Archive of motion capture as surface shapes,” in Proceedings of the IEEE/CVF international conference on computer vision, pp. 5442–5451, 2019.
  68. W. Su, Y. Liu, S. Li, and Z. Cai, “Proprioception-driven wearer pose estimation for egocentric video,” in 2022 26th International Conference on Pattern Recognition (ICPR), pp. 3728–3735, IEEE, 2022.
  69. H. Cuevas-Velasquez, C. Hewitt, S. Aliakbarian, and T. Baltrušaitis, “Simpleego: Predicting probabilistic body pose from egocentric cameras,” arXiv preprint arXiv:2401.14785, 2024.
  70. Y. Li, T. Nagarajan, B. Xiong, and K. Grauman, “Ego-exo: Transferring visual representations from third-person to first-person videos,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6943–6953, 2021.
  71. B. Soran, A. Farhadi, and L. Shapiro, “Action recognition in the presence of one egocentric and multiple static cameras,” in Computer Vision–ACCV 2014: 12th Asian Conference on Computer Vision, Singapore, Singapore, November 1-5, 2014, Revised Selected Papers, Part V 12, pp. 178–193, Springer, 2015.
  72. R. Yonetani, K. M. Kitani, and Y. Sato, “Recognizing micro-actions and reactions from paired egocentric videos,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2629–2638, 2016.
  73. Z. Luo, R. Hachiuma, Y. Yuan, S. Iwase, and K. M. Kitani, “Kinematics-guided reinforcement learning for object-aware 3d ego-pose estimation,” arXiv preprint arXiv:2011.04837, 2020.
  74. Z. Luo, R. Hachiuma, Y. Yuan, and K. Kitani, “Dynamics-regulated kinematic policy for egocentric pose estimation,” Advances in Neural Information Processing Systems, vol. 34, pp. 25019–25032, 2021.
  75. V. Sitzmann, M. Zollhöfer, and G. Wetzstein, “Scene representation networks: Continuous 3d-structure-aware neural scene representations,” Advances in Neural Information Processing Systems, vol. 32, 2019.
  76. J. Thies, M. Zollhöfer, and M. Nießner, “Deferred neural rendering: Image synthesis using neural textures,” Acm Transactions on Graphics (TOG), vol. 38, no. 4, pp. 1–12, 2019.
  77. D. Mehta, S. Sridhar, O. Sotnychenko, H. Rhodin, M. Shafiei, H.-P. Seidel, W. Xu, D. Casas, and C. Theobalt, “Vnect: Real-time 3d human pose estimation with a single rgb camera,” Acm transactions on graphics (tog), vol. 36, no. 4, pp. 1–14, 2017.
  78. D. Mehta, H. Rhodin, D. Casas, P. Fua, O. Sotnychenko, W. Xu, and C. Theobalt, “Monocular 3d human pose estimation in the wild using improved cnn supervision,” in 2017 international conference on 3D vision (3DV), pp. 506–516, IEEE, 2017.
  79. J. Martinez, R. Hossain, J. Romero, and J. J. Little, “A simple yet effective baseline for 3d human pose estimation,” in Proceedings of the IEEE international conference on computer vision, pp. 2640–2649, 2017.
  80. H. Rhodin, J. Spörri, I. Katircioglu, V. Constantin, F. Meyer, E. Müller, M. Salzmann, and P. Fua, “Learning monocular 3d human pose estimation from multi-view images,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 8437–8446, 2018.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com