HMP: Hand Motion Priors for Pose and Shape Estimation from Video (2312.16737v1)
Abstract: Understanding how humans interact with the world necessitates accurate 3D hand pose estimation, a task complicated by the hand's high degree of articulation, frequent occlusions, self-occlusions, and rapid motions. While most existing methods rely on single-image inputs, videos have useful cues to address aforementioned issues. However, existing video-based 3D hand datasets are insufficient for training feedforward models to generalize to in-the-wild scenarios. On the other hand, we have access to large human motion capture datasets which also include hand motions, e.g. AMASS. Therefore, we develop a generative motion prior specific for hands, trained on the AMASS dataset which features diverse and high-quality hand motions. This motion prior is then employed for video-based 3D hand motion estimation following a latent optimization approach. Our integration of a robust motion prior significantly enhances performance, especially in occluded scenarios. It produces stable, temporally consistent results that surpass conventional single-frame methods. We demonstrate our method's efficacy via qualitative and quantitative evaluations on the HO3D and DexYCB datasets, with special emphasis on an occlusion-focused subset of HO3D. Code is available at https://hmp.is.tue.mpg.de
- Deformer: Dynamic fusion transformer for robust hand pose estimation. ArXiv, abs/2303.04991, 2023.
- Honnotate: A method for 3D annotation of hand and object poses. In CVPR, pages 3196–3206, 2020.
- HO-3D-v3: Improving the accuracy of hand-object annotations of the HO-3D dataset, 2021.
- Leveraging photometric consistency over time for sparsely supervised hand-object reconstruction. In CVPR, pages 571–580, 2020.
- Learning joint reconstruction of hands and manipulated objects. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019.
- NeMF: Neural motion fields for kinematic animation. In NeurIPS, 2022.
- Adam: A method for stochastic optimization. In ICLR, 2014.
- Semi-supervised 3D hand-object poses estimation with interactions in time. In CVPR, pages 14687–14697, 2021.
- MediaPipe: A framework for building perception pipelines, 2019.
- AMASS: Archive of motion capture as surface shapes. In ICCV, 2019.
- Handoccnet: Occlusion-robust 3D hand mesh estimation network. In CVPR, pages 1496–1505, 2022.
- HuMoR: 3d human motion model for robust pose estimation. In ICCV, 2021.
- Pymaf-x: Towards well-aligned full-body model regression from monocular images. IEEE TPAMI, 2023.
- On the continuity of rotation representations in neural networks. In CVPR, pages 5745–5753, 2019.
- TempCLR: Reconstructing hands via time-coherent contrastive learning. In 3DV, 2022.