Self-supervised 6D Object Pose Estimation for Robot Manipulation (1909.10159v2)

Published 23 Sep 2019 in cs.RO

Abstract: To teach robots skills, it is crucial to obtain data with supervision. Since annotating real world data is time-consuming and expensive, enabling robots to learn in a self-supervised way is important. In this work, we introduce a robot system for self-supervised 6D object pose estimation. Starting from modules trained in simulation, our system is able to label real world images with accurate 6D object poses for self-supervised learning. In addition, the robot interacts with objects in the environment to change the object configuration by grasping or pushing objects. In this way, our system is able to continuously collect data and improve its pose estimation modules. We show that the self-supervised learning improves object segmentation and 6D pose estimation performance, and consequently enables the system to grasp objects more reliably. A video showing the experiments can be found at https://youtu.be/W1Y0Mmh1Gd8.

Citations (171)

View on Semantic Scholar

Summary

The paper introduces a self-supervised framework that enables robots to autonomously collect and label real-world data for accurate 6D object pose estimation.
It integrates simulated training with autonomous real-world interactions using RGB-D cameras and PoseRBPF for robust pose initialization and tracking.
Quantitative evaluations show significant improvements in segmentation F1 scores and grasping success rates, underscoring its impact on robotic manipulation.

An Overview of Self-supervised 6D Object Pose Estimation for Robot Manipulation

In this paper, Deng et al. introduce a novel robot system to address the labor-intensive task of data collection for learning-based robotic manipulation by employing self-supervised learning for 6D object pose estimation. The primary focus of their work is on enabling a robot to autonomously interact with and label real-world data, thereby assembling a high-quality dataset for continuous improvement of its manipulation capabilities without human intervention.

System Architecture and Methodology

The proposed system extends state-of-the-art 6D object pose estimation techniques, incorporating modules for object segmentation and pose estimation using PoseRBPF (Rao-Blackwellized Particle Filter). These components are initially trained in a simulated environment. Self-supervised data collection begins by autonomously labeling real-world images as the robot interacts with various objects through grasping or pushing actions. This strategic interaction not only facilitates more diverse data collection but also allows the system to address domain gap challenges often encountered when applying models trained in simulation to the real world.

Pose Initialization and Tracking: The robot uses an RGB-D camera for visual perception. It initializes the 6D poses of objects using an encoding of the current scene, followed by pose refinement to achieve high localization accuracy through depth matching with object models. Continuous pose tracking during robot movements enables robust dynamic environments.

Self-supervised Training: Iteratively, the robot fine-tunes its underlying neural networks (e.g., PoseCNN for segmentation, autoencoders for orientation) using self-acquired annotations, reducing reliance on labor-intensive manual labeling or fully synthetic data. The system iteratively builds upon successfully annotated data, enhancing the reliability of model-based grasping.

Results and Implications

The experiments conducted demonstrate substantial improvements in both object segmentation and 6D pose estimation accuracy, with self-supervised learning bridging the gap between synthetic training datasets and real-world application. Quantitative results indicated significant improvements in the F1 score of semantic segmentation, with remarkable performance gains for previously challenging-to-segment objects. Moreover, the incorporation of the self-annotated real-world data led to enhanced robustness in real-environment robot grasping tasks, achieving high success rates and reduced time for pose initialization and task execution.

This research underscores the potential of self-supervised learning frameworks in robotics, pointing towards an era of robots capable of life-long learning. The presented system not only mitigates the steep costs associated with manual data labeling but also facilitates continued performance improvements through autonomous data acquisition. The implications for this are vast, notably in fields where robots must operate in dynamic and unpredictable environments.

Future Directions

Although the accomplishments of this work are notable, future research should focus on expanding the diversity of objects and scenes tackled by this system, thereby ensuring comprehensive robustness and adaptability. Furthermore, integrating this framework with tactile sensing for improved grasp planning and execution, and exploring multi-robot scenarios for collaborative perception and manipulation tasks could significantly advance the scope of self-supervised robotic systems.

In conclusion, Deng et al.'s contribution to the domain of robotic manipulation opens new avenues for developing more intelligent and autonomous systems capable of sophisticated human-like interaction with the physical world. This work provides a compelling foundation for future advancements in creating self-improving robotic technologies.

PDF Markdown

Related Papers

YouTube

Show All Videos