Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
162 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Spatial-Related Sensors Matters: 3D Human Motion Reconstruction Assisted with Textual Semantics (2401.05412v1)

Published 27 Dec 2023 in cs.CV, cs.AI, and eess.SP

Abstract: Leveraging wearable devices for motion reconstruction has emerged as an economical and viable technique. Certain methodologies employ sparse Inertial Measurement Units (IMUs) on the human body and harness data-driven strategies to model human poses. However, the reconstruction of motion based solely on sparse IMUs data is inherently fraught with ambiguity, a consequence of numerous identical IMU readings corresponding to different poses. In this paper, we explore the spatial importance of multiple sensors, supervised by text that describes specific actions. Specifically, uncertainty is introduced to derive weighted features for each IMU. We also design a Hierarchical Temporal Transformer (HTT) and apply contrastive learning to achieve precise temporal and feature alignment of sensor data with textual semantics. Experimental results demonstrate our proposed approach achieves significant improvements in multiple metrics compared to existing methods. Notably, with textual supervision, our method not only differentiates between ambiguous actions such as sitting and standing but also produces more precise and natural motion.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (32)
  1. Realtime multi-person 2d pose estimation using part affinity fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 7291–7299.
  2. Cross-view tracking for multi-human 3d pose estimation at over 100 fps. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 3279–3288.
  3. Full-body motion from a single head-mounted device: Generating smpl poses from partial observations. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 11687–11697.
  4. Generating diverse and natural 3d human motions from text. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 5152–5161.
  5. Deep inertial poser: Learning to reconstruct human pose from sparse inertial measurements in real time. ACM Transactions on Graphics (TOG), 37(6): 1–15.
  6. Avatarposer: Articulated full-body pose tracking from sparse motion sensing. In Proceedings of the European conference on computer vision (ECCV), 443–460. Springer.
  7. Transformer Inertial Poser: Real-Time Human Motion Reconstruction from Sparse IMUs with Simultaneous Terrain Generation. In SIGGRAPH Asia 2022 Conference Papers, SA ’22 Conference Papers.
  8. What uncertainties do we need in bayesian deep learning for computer vision? Advances in Neural Information Processing Systems (NeurIPS), 30.
  9. Fusion Poser: 3D Human Pose Estimation Using Sparse IMUs and Head Trackers in Real Time. Sensors, 22(13): 4846.
  10. Adam: A Method for Stochastic Optimization. arXiv:1412.6980.
  11. Auto-Encoding Variational Bayes. arXiv:1312.6114.
  12. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 10012–10022.
  13. SMPL: A skinned multi-person linear model. ACM Transactions on Graphics (TOG), 34(6): 1–16.
  14. AMASS: Archive of motion capture as surface shapes. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 5442–5451.
  15. Real-time full-body motion capture from video and imus. In 2017 International Conference on 3D Vision (3DV), 449–457. IEEE.
  16. IMU2CLIP: Multimodal Contrastive Learning for IMU Motion Sensors from Egocentric Videos and Text. arXiv:2210.14395.
  17. BABEL: Bodies, Action and Behavior with English Labels. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 722–731.
  18. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning, 8748–8763. PMLR.
  19. Xsens MVN: Consistent tracking of human motion using inertial sensing. Xsens Technol, 1(8): 1–8.
  20. HuManiFlow: Ancestor-Conditioned Normalising Flows on SO (3) Manifolds for Human Pose and Shape Distribution Estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 4779–4789.
  21. Action capture with accelerometers. In Proceedings of the 2008 ACM SIGGRAPH/Eurographics Symposium on Computer Animation, 193–199.
  22. Motion reconstruction using sparse accelerometer data. ACM Transactions on Graphics (TOG), 30(3): 1–12.
  23. Motionclip: Exposing human motion generation to clip space. In Proceedings of the European conference on computer vision (ECCV), 358–374. Springer.
  24. Total capture: 3d human pose estimation fusing video and inertial sensors. In Proceedings of 28th British Machine Vision Conference, 1–13.
  25. Attention is all you need. Advances in Neural Information Processing Systems (NeurIPS), 30.
  26. Recovering accurate 3d human pose in the wild using imus and a moving camera. In Proceedings of the European conference on computer vision (ECCV), 601–617.
  27. Human pose estimation from video and imus. IEEE transactions on Pattern Analysis and Machine Intelligence (PAMI), 38(8): 1533–1547.
  28. Sparse inertial poser: Automatic 3d human pose estimation from sparse imus. In Computer Graphics Forum, volume 36, 349–360. Wiley Online Library.
  29. Physical inertial poser (pip): Physics-aware real-time human motion tracking from sparse inertial sensors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 13167–13178.
  30. Transpose: Real-time 3d human translation and pose estimation with six inertial sensors. ACM Transactions on Graphics (TOG), 40(4): 1–13.
  31. MotionDiffuse: Text-Driven Human Motion Generation with Diffusion Model. arXiv:2208.15001.
  32. On the continuity of rotation representations in neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 5745–5753.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets