Hybrid model for Single-Stage Multi-Person Pose Estimation (2305.01167v2)
Abstract: In general, human pose estimation methods are categorized into two approaches according to their architectures: regression (i.e., heatmap-free) and heatmap-based methods. The former one directly estimates precise coordinates of each keypoint using convolutional and fully-connected layers. Although this approach is able to detect overlapped and dense keypoints, unexpected results can be obtained by non-existent keypoints in a scene. On the other hand, the latter one is able to filter the non-existent ones out by utilizing predicted heatmaps for each keypoint. Nevertheless, it suffers from quantization error when obtaining the keypoint coordinates from its heatmaps. In addition, unlike the regression one, it is difficult to distinguish densely placed keypoints in an image. To this end, we propose a hybrid model for single-stage multi-person pose estimation, named HybridPose, which mutually overcomes each drawback of both approaches by maximizing their strengths. Furthermore, we introduce self-correlation loss to inject spatial dependencies between keypoint coordinates and their visibility. Therefore, HybridPose is capable of not only detecting densely placed keypoints, but also filtering the non-existent keypoints in an image. Experimental results demonstrate that proposed HybridPose exhibits the keypoints visibility without performance degradation in terms of the pose estimation accuracy.
- Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934, 2020.
- The center of attention: Center-keypoint grouping via attention for multi-person pose estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 11853–11863, 2021.
- Realtime multi-person 2d pose estimation using part affinity fields. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 7291–7299, 2017.
- Human pose estimation with iterative error feedback. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4733–4742, 2016.
- Cascaded pyramid network for multi-person pose estimation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 7103–7112, 2018.
- Channel-wise topology refinement graph convolution for skeleton-based action recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 13359–13368, 2021.
- Higherhrnet: Scale-aware representation learning for bottom-up human pose estimation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5386–5395, 2020.
- Pyskl: Towards good practices for skeleton action recognition. In Proceedings of the 30th ACM International Conference on Multimedia, pages 7351–7354, 2022.
- Revisiting skeleton-based action recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2969–2978, 2022.
- Bottom-up human pose estimation via disentangled keypoint regression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14676–14686, 2021.
- Mask r-cnn. In Proceedings of the IEEE international conference on computer vision, pages 2961–2969, 2017.
- Deepercut: A deeper, stronger, and faster multi-person pose estimation model. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part VI 14, pages 34–50. Springer, 2016.
- Learning human pose estimation features with convolutional networks. arXiv preprint arXiv:1312.7302, 2013.
- ultralytics/yolov5: v7.0 - yolov5 sota realtime instance segmentation, 2022.
- Human pose regression with residual log-likelihood estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 11025–11034, 2021.
- Crowdpose: Efficient crowded scenes pose estimation and a new benchmark. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10863–10872, 2019.
- Feature pyramid networks for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2117–2125, 2017.
- Microsoft coco: Common objects in context. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, pages 740–755. Springer, 2014.
- Rethinking the heatmap regression for bottom-up human pose estimation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 13264–13273, 2021.
- Rethinking keypoint representations: Modeling keypoints and poses as objects for multi-person human pose estimation. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part VI, pages 37–54. Springer, 2022.
- Stacked hourglass networks for human pose estimation. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part VIII 14, pages 483–499. Springer, 2016.
- Deepcut: Joint subset partition and labeling for multi person pose estimation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4929–4937, 2016.
- You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 779–788, 2016.
- Deep high-resolution representation learning for human pose estimation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5693–5703, 2019.
- Efficient object localization using convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 648–656, 2015.
- Joint training of a convolutional network and a graphical model for human pose estimation. Advances in neural information processing systems, 27, 2014.
- Deeppose: Human pose estimation via deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1653–1660, 2014.
- Scaled-yolov4: Scaling cross stage partial network. In Proceedings of the IEEE/cvf conference on computer vision and pattern recognition, pages 13029–13038, 2021.
- Point-set anchors for object detection, instance segmentation and pose estimation. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part X 16, pages 527–544. Springer, 2020.
- Convolutional pose machines. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pages 4724–4732, 2016.
- Simple baselines for human pose estimation and tracking. In Proceedings of the European conference on computer vision (ECCV), pages 466–481, 2018.
- Improved human-object interaction detection through skeleton-object relations. Journal of Experimental & Theoretical Artificial Intelligence, 34(1):41–52, 2022.
- Skeleton-based interactive graph network for human object interaction detection. In 2020 IEEE International Conference on Multimedia and Expo (ICME), pages 1–6. IEEE, 2020.
- Objects as points. arXiv preprint arXiv:1904.07850, 2019.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.