DPOD: 6D Pose Object Detector and Refiner (1902.11020v3)

Published 28 Feb 2019 in cs.CV and cs.RO

Abstract: In this paper we present a novel deep learning method for 3D object detection and 6D pose estimation from RGB images. Our method, named DPOD (Dense Pose Object Detector), estimates dense multi-class 2D-3D correspondence maps between an input image and available 3D models. Given the correspondences, a 6DoF pose is computed via PnP and RANSAC. An additional RGB pose refinement of the initial pose estimates is performed using a custom deep learning-based refinement scheme. Our results and comparison to a vast number of related works demonstrate that a large number of correspondences is beneficial for obtaining high-quality 6D poses both before and after refinement. Unlike other methods that mainly use real data for training and do not train on synthetic renderings, we perform evaluation on both synthetic and real training data demonstrating superior results before and after refinement when compared to all recent detectors. While being precise, the presented approach is still real-time capable.

Citations (396)

View on Semantic Scholar

Summary

The paper introduces DPOD, a deep learning approach that uses dense 2D-3D correspondences to accurately estimate 6D object poses.
It integrates both synthetic and real training data, boosting robustness and performance over traditional methods.
The method leverages an encoder-decoder network with PnP and RANSAC for pose computation, validated on benchmark datasets like LineMOD and OCCLUSION.

DPOD: 6D Pose Object Detector and Refiner

The paper "DPOD: 6D Pose Object Detector and Refiner" introduces DPOD, a novel deep learning-based approach for 3D object detection and 6D pose estimation from RGB images. The authors present a method that estimates dense 2D-3D correspondence maps, enabling precise pose computation through PnP and RANSAC algorithms. Unlike prior methods that predominantly rely solely on real data for training, this research emphasizes the integration of both synthetic and real training data and shows superior results in pose estimation tasks.

Method Overview

DPOD consists of two core components: the correspondence block and pose block. The correspondence block leverages an encoder-decoder network architecture to predict multi-class object ID masks and dense 2D-3D correspondences. This architecture employs a pixel-wise classification approach for correspondence prediction, which enhances the quality and robustness of pose estimation. The pose block utilizes PnP and RANSAC to compute the 6DoF pose from the estimated correspondences.

Novel Contributions

Dense Correspondence Mapping: The method focuses on generating dense correspondence maps, which are pivotal for accurate pose estimation. This approach contrasts with previous methods that rely on a limited set of correspondences or anchor points, such as bounding box corners.
Synthetic and Real Training Data: The paper demonstrates that the approach is capable of training on both synthetic and real datasets. Importantly, the method maintains high performance across these different data types, affirming its versatility in various operational environments.
Deep Learning-Based Refinement: An additional deep learning-based refiner further enhances the initial pose estimates. This refiner utilizes a novel architecture that combines feature extraction from input RGB images and corresponding synthetic renderings of the detected object, improving the pose accuracy.

Experimental Validation

DPOD was rigorously evaluated on benchmark datasets such as LineMOD and OCCLUSION, standard testbeds for pose estimation tasks. Notably, the results indicate significant improvements over existing methods, particularly in cases involving complex scenes with occlusions or when limited real training data is available. The reported ADD score, especially after applying the refinement, underscores the method's efficacy over other competitive approaches like PoseCNN and SSD6D.

Implications and Future Work

The implications of DPOD are substantial for applications requiring precise 3D object detection and pose estimation, such as in augmented reality and robotics. By leveraging dense correspondence maps and integrating synthetic and real data for training, DPOD pushes the limits of what is achievable with RGB inputs alone.

Potential future developments could explore further optimization of the network architectures for increased efficiency, especially in RANSAC iterations. Additionally, extending this approach to handle even more intricate scenes with higher levels of clutter and dynamic elements could be explored.

Overall, while not revolutionary, DPOD offers significant advancements in the field of computer vision, enhancing the accuracy and robustness of 6D pose estimation techniques.

PDF Markdown