SHARE: Single-view Human Adversarial REconstruction (2401.00343v1)
Abstract: The accuracy of 3D Human Pose and Shape reconstruction (HPS) from an image is progressively improving. Yet, no known method is robust across all image distortion. To address issues due to variations of camera poses, we introduce SHARE, a novel fine-tuning method that utilizes adversarial data augmentation to enhance the robustness of existing HPS techniques. We perform a comprehensive analysis on the impact of camera poses on HPS reconstruction outcomes. We first generated large-scale image datasets captured systematically from diverse camera perspectives. We then established a mapping between camera poses and reconstruction errors as a continuous function that characterizes the relationship between camera poses and HPS quality. Leveraging this representation, we introduce RoME (Regions of Maximal Error), a novel sampling technique for our adversarial fine-tuning method. The SHARE framework is generalizable across various single-view HPS methods and we demonstrate its performance on HMR, SPIN, PARE, CLIFF and ExPose. Our results illustrate a reduction in mean joint errors across single-view HPS techniques, for images captured from multiple camera positions without compromising their baseline performance. In many challenging cases, our method surpasses the performance of existing models, highlighting its practical significance for diverse real-world applications.
- Adobe. Mixamo, 2020.
- 2d human pose estimation: New benchmark and state of the art analysis. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014.
- 3d multi-bodies: Fitting sets of plausible 3d human models to ambiguous image data. Advances in Neural Information Processing Systems, 33:20496–20507, 2020.
- BEDLAM: A synthetic dataset of bodies exhibiting detailed lifelike animated motion. In Proceedings IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), pages 8726–8737, 2023.
- Keep it smpl: Automatic estimation of 3d human pose and shape from a single image. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part V 14, pages 561–578. Springer, 2016.
- Makehuman: a review of the modelling framework. In Congress of the International Ergonomics Association, pages 224–232. Springer, 2018.
- 3d human body reconstruction based on smpl model. The Visual Computer, pages 1–14, 2022.
- Synthesizing training images for boosting human 3d pose estimation. In 2016 Fourth International Conference on 3D Vision (3DV), pages 479–488. IEEE, 2016.
- Parametric 3d modeling of a symmetric human body. Computers & Graphics, 81:52–60, 2019.
- Parametric human body reconstruction based on sparse key points. IEEE Transactions on Visualization and Computer Graphics, 22(11):2467–2479, 2016.
- Pose2mesh: Graph convolutional network for 3d human pose and mesh recovery from a 2d human pose. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part VII 16, pages 769–787. Springer, 2020.
- Beyond static features for temporally consistent 3d human pose and shape from a video. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1964–1973, 2021.
- Self adversarial training for human pose estimation. In 2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), pages 17–30. IEEE, 2018.
- Monocular expressive body regression through body-driven attention. In European Conference on Computer Vision, pages 20–40. Springer, 2020.
- Blender Online Community. Blender - a 3D modelling and rendering package. Blender Foundation, Stichting Blender Foundation, Amsterdam, 2018.
- MMHuman3D Contributors. Openmmlab 3d human parametric model toolbox and benchmark. https://github.com/open-mmlab/mmhuman3d, 2021.
- Autoaugment: Learning augmentation policies from data. arXiv preprint arXiv:1805.09501, 2018.
- Improved regularization of convolutional neural networks with cutout. arXiv preprint arXiv:1708.04552, 2017.
- Sim2real transfer learning for 3d human pose estimation: motion to the rescue. Advances in Neural Information Processing Systems, 32, 2019.
- Flownet: Learning optical flow with convolutional networks. In Proceedings of the IEEE international conference on computer vision, pages 2758–2766, 2015.
- Learning to regress bodies from images using differentiable semantic rendering. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 11250–11259, 2021.
- The selfie as image (and) practice: Approaching digital self-photography. Exploring the selfie: Historical, theoretical, and analytical approaches to digital self-photography, pages 1–23, 2018.
- Efficient convnet-based marker-less motion capture in general scenes with a low number of cameras. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3810–3818, 2015.
- Marconi—convnet-based marker-less motion capture in outdoor and indoor scenes. IEEE transactions on pattern analysis and machine intelligence, 39(3):501–514, 2016.
- Syntactic model-based human body 3d reconstruction and event classification via association based features mining and deep learning. PeerJ Computer Science, 7:e764, 2021.
- Adaptpose: Cross-dataset adaptation for 3d human pose estimation by learnable motion generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13075–13085, 2022.
- Robustness of deep convolutional neural networks for image degradations. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 2916–2920, 2018.
- Holopose: Holistic 3d human reconstruction in-the-wild. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10884–10894, 2019.
- Benchmarking neural network robustness to common corruptions and surface variations. arXiv preprint arXiv:1807.01697, 2018.
- Augmix: A simple data processing method to improve robustness and uncertainty. arXiv preprint arXiv:1912.02781, 2019.
- Personalized 3d mannequin reconstruction based on 3d scanning. International Journal of Clothing Science and Technology, 2018.
- 3dbodynet: fast reconstruction of 3d animatable human body shape from a single commodity depth camera. IEEE Transactions on Multimedia, 24:2139–2149, 2021.
- Human3. 6m: Large scale datasets and predictive methods for 3d human sensing in natural environments. IEEE transactions on pattern analysis and machine intelligence, 36(7):1325–1339, 2013.
- 3d human body reconstruction from a single image via volumetric regression. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops, pages 0–0, 2018.
- Clustered pose and nonlinear appearance models for human pose estimation. In Proceedings of the British Machine Vision Conference, 2010.
- Panoptic studio: A massively multiview system for social motion capture. In Proceedings of the IEEE International Conference on Computer Vision, pages 3334–3342, 2015.
- Exemplar fine-tuning for 3d human model fitting towards in-the-wild 3d human pose estimation. In 2021 International Conference on 3D Vision (3DV), pages 42–52. IEEE, 2021.
- Unity: A general platform for intelligent agents. arXiv preprint arXiv:1809.02627, 2018.
- End-to-end recovery of human shape and pose, 2018.
- Learning 3d human dynamics from video. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5614–5623, 2019.
- View-invariant 3d human body pose reconstruction using a monocular video camera. In 2011 Fifth ACM/IEEE International Conference on Distributed Smart Cameras, pages 1–6. IEEE, 2011.
- Vibe: Video inference for human body pose and shape estimation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5253–5263, 2020.
- Pare: Part attention regressor for 3d human body estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 11127–11137, 2021a.
- Spec: Seeing people in the wild with an estimated camera. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 11035–11045, 2021b.
- Learning to reconstruct 3d human pose and shape via model-fitting in the loop. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2252–2261, 2019.
- Probabilistic modeling for human mesh recovery. In Proceedings of the IEEE/CVF international conference on computer vision, pages 11605–11614, 2021.
- Smply benchmarking 3d human pose estimation in the wild. In 2020 International Conference on 3D Vision (3DV), pages 301–310. IEEE, 2020.
- Hybrik: A hybrid analytical-neural inverse kinematics solution for 3d human pose and shape estimation, 2021.
- Cliff: Carrying location information in full frames into human pose and shape estimation. In ECCV, 2022.
- Shape-aware human pose and shape reconstruction using multi-view images. In Proceedings of the IEEE/CVF international conference on computer vision, pages 4352–4362, 2019.
- Microsoft coco: Common objects in context. In European conference on computer vision, pages 740–755. Springer, 2014.
- View-invariant, occlusion-robust probabilistic embedding for human pose. International Journal of Computer Vision, pages 1–25, 2021.
- Markerless motion capture of interacting characters using multi-view image segmentation. In CVPR 2011, pages 1249–1256. Ieee, 2011.
- Mosh: motion and shape capture from sparse markers. ACM Trans. Graph., 33(6):220–1, 2014.
- SMPL: A skinned multi-person linear model. ACM Trans. Graphics (Proc. SIGGRAPH Asia), 34(6):248:1–248:16, 2015.
- Accurate nonrigid 3d human body surface reconstruction using commodity depth sensors. Computer animation and virtual worlds, 29(5):e1807, 2018.
- 3d human motion estimation via motion compression and refinement. In Proceedings of the Asian Conference on Computer Vision, 2020.
- Monocular 3d human pose estimation in the wild using improved cnn supervision. In 2017 international conference on 3D vision (3DV), pages 506–516. IEEE, 2017a.
- Monocular 3d human pose estimation in the wild using improved cnn supervision. In 3D Vision (3DV), 2017 Fifth International Conference on. IEEE, 2017b.
- Single-shot multi-person 3d pose estimation from monocular rgb. In 2018 International Conference on 3D Vision (3DV), pages 120–130. IEEE, 2018.
- I2l-meshnet: Image-to-lixel prediction network for accurate 3d human pose and mesh estimation from a single rgb image. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part VII 16, pages 752–768. Springer, 2020.
- Neural body fitting: Unifying deep learning and model based human pose and shape estimation. In 2018 international conference on 3D vision (3DV), pages 484–494. IEEE, 2018.
- Benchmarking and analyzing 3d human pose and shape estimation beyond algorithms. In Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2022.
- AGORA: Avatars in geography optimized for regression analysis. In Proceedings IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), 2021.
- Expressive body capture: 3d hands, face, and body from a single image. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10975–10985, 2019.
- Pawel Potemkowski. Populating your digital worlds!, 2023.
- 3DPeople: Modeling the Geometry of Dressed Humans. In International Conference in Computer Vision (ICCV), 2019.
- Structured coupled generative adversarial networks for unsupervised monocular depth estimation. In 2019 International Conference on 3D Vision (3DV), pages 18–26. IEEE, 2019.
- Learning multi-human optical flow. International Journal of Computer Vision, 128(4):873–890, 2020.
- Developing and implementing parametric human body shape models in ergonomics software. In Proceedings of the 3rd international digital human modeling conference, Tokyo, 2014.
- renderpeople. Renderpeople, 2018. https://renderpeople.com/3d-people/.
- Unsupervised view-invariant human posture representation. arXiv preprint arXiv:2109.08730, 2021.
- Synthetic training for accurate 3d human pose and shape estimation in the wild. arXiv preprint arXiv:2009.10013, 2020.
- Probabilistic 3d human shape and pose estimation from multiple unconstrained images in the wild. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16094–16104, 2021a.
- Hierarchical kinematic probability distributions for 3d human shape and pose estimation from images in the wild. In Proceedings of the IEEE/CVF international conference on computer vision, pages 11219–11229, 2021b.
- Humaniflow: Ancestor-conditioned normalising flows on so (3) manifolds for human pose and shape distribution estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4779–4789, 2023.
- Gradient-free adversarial training against image corruption for learning-based steering. Advances in Neural Information Processing Systems, 34:26250–26263, 2021a.
- Improving robustness of learning-based autonomous steering using adversarial images. arXiv preprint arXiv:2102.13262, 2021b.
- Humaneva: Synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion. International journal of computer vision, 87(1):4–27, 2010.
- Human body model fitting by learned gradient descent. In European Conference on Computer Vision, pages 744–760. Springer, 2020.
- Human mesh recovery from monocular images via a skeleton-disentangled representation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5349–5358, 2019.
- The effects of viewing angle, camera angle, and sign of surface curvature on the perception of three-dimensional shape from texture. Journal of vision, 7(12):9–9, 2007.
- Total capture: 3d human pose estimation fusing video and inertial sensors. In Proceedings of 28th British Machine Vision Conference, pages 1–13, 2017.
- Learning from synthetic humans. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 109–117, 2017.
- Recovering accurate 3d human pose in the wild using imus and a moving camera. In European Conference on Computer Vision (ECCV), 2018.
- Repnet: Weakly supervised training of an adversarial reprojection network for 3d human pose estimation. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 7774–7783, 2019.
- Canonpose: Self-supervised monocular 3d human pose estimation in the wild. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13294–13304, 2021.
- Monocular total capture: Posing face, body, and hands in the wild. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10965–10974, 2019.
- Humbi: A large multiview dataset of human body expressions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2990–3000, 2020.
- Cutmix: Regularization strategy to train strong classifiers with localizable features. In International Conference on Computer Vision (ICCV), 2019.
- Neural descent for visual 3d human pose and shape. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14484–14493, 2021.
- mixup: Beyond empirical risk minimization. International Conference on Learning Representations, 2018.
- Pymaf: 3d human pose and shape regression with pyramidal mesh alignment feedback loop. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 11446–11456. IEEE, 2021.
- Pymaf-x: Towards well-aligned full-body model regression from monocular images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
- Pose2seg: Detection free human instance segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 889–898, 2019.
- Object-occluded human shape and pose estimation from a single color image. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 7376–7385, 2020.
- Deephuman: 3d human reconstruction from a single image. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7739–7749, 2019.
- Freihand: A dataset for markerless capture of hand pose and shape from single rgb images. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 813–822, 2019.
- Data-driven 3d reconstruction of dressed humans from sparse views. In 2021 International Conference on 3D Vision (3DV), pages 494–504. IEEE, 2021.