RoboPack: Learning Tactile-Informed Dynamics Models for Dense Packing (2407.01418v1)

Published 1 Jul 2024 in cs.RO, cs.AI, and cs.LG

Abstract: Tactile feedback is critical for understanding the dynamics of both rigid and deformable objects in many manipulation tasks, such as non-prehensile manipulation and dense packing. We introduce an approach that combines visual and tactile sensing for robotic manipulation by learning a neural, tactile-informed dynamics model. Our proposed framework, RoboPack, employs a recurrent graph neural network to estimate object states, including particles and object-level latent physics information, from historical visuo-tactile observations and to perform future state predictions. Our tactile-informed dynamics model, learned from real-world data, can solve downstream robotics tasks with model-predictive control. We demonstrate our approach on a real robot equipped with a compliant Soft-Bubble tactile sensor on non-prehensile manipulation and dense packing tasks, where the robot must infer the physics properties of objects from direct and indirect interactions. Trained on only an average of 30 minutes of real-world interaction data per task, our model can perform online adaptation and make touch-informed predictions. Through extensive evaluations in both long-horizon dynamics prediction and real-world manipulation, our method demonstrates superior effectiveness compared to previous learning-based and physics-based simulation systems.

Citations (4)

View on Semantic Scholar

Summary

The paper introduces a novel method integrating tactile and visual data into a recurrent GNN to predict object dynamics in dense packing scenarios.
It employs a multi-modal perception module and latent physics vectors, achieving superior predictive accuracy compared to existing baselines.
The approach enables precise model predictive control, excelling in real-world tasks like non-prehensile box pushing and dense packing.

Overview of "RoboPack: Learning Tactile-Informed Dynamics Models for Dense Packing"

Introduction

The paper "RoboPack: Learning Tactile-Informed Dynamics Models for Dense Packing" addresses the complex challenge of robotic manipulation in highly occluded environments with multiple objects. The proposed method leverages a combination of visual and tactile feedback to build dynamic models that predict object states and facilitate effective manipulation and dense packing tasks. This is achieved by integrating these sensory inputs into a recurrent graph neural network (GNN) framework.

Methodology

Perception Module

The visual perception module processes RGB-D observations to output tracked 3D keypoints for each object, maintaining temporal consistency. Key elements include:

Distance to Surface: Encouraging points to be close to object surfaces.
Semantic Alignment: Aligning multi-view interpolated DINOv2 features across frames.
Motion Regularization and Mask Consistency: These terms accommodate objects' occlusions and ensure point tracking over time.

For tactile perception, the Soft-Bubble tactile sensor provides 3D contact force data translated into embeddings, which are integrated into the particle-based scene representation. This setup effectively fuses tactile and visual data, providing an enhanced scene understanding critical for dense packing tasks.

State Estimation and Dynamics Prediction

The state estimation employs a history of tactile interactions to improve the accuracy of predicted object states. This auto-regressive approach extends to latent physics vectors that encapsulate the objects' physical properties. The dynamics prediction model leverages these latent vectors to anticipate future states without requiring explicit tactile future prediction.

The framework's dynamics are computed using GNNs, where nodes represent object particles and edges depict interactions. This GNN structure effectively models the intricate dynamics present in non-prehensile manipulation and dense packing tasks.

Model-Predictive Control

Utilizing the learned dynamics models, the system applies Model Predictive Path Integral (MPPI) control to plan and optimize actions. This approach minimizes a specified cost function over predicted states, ensuring the robot's actions lead to the desired manipulation outcomes.

Experimental Setup

The effectiveness of RoboPack was validated through two primary tasks: Non-Prehensile Box Pushing and Dense Packing. For these tasks, datasets were collected using real-world interactions facilitated by human teleoperation to ensure diverse and safe interaction behaviors.

Task Descriptions

Non-Prehensile Box Pushing:
- Objective: Push a box to a goal pose, with different boxes having varied mass distributions.
- Challenges: Identifying mass distributions through tactile feedback and handling object compliance and rotation slips.
Dense Packing:
- Objective: Insert an additional object into a densely packed box, identifying feasible regions for insertion via tactile feedback.
- Challenges: Handling significant occlusions and determining regions suitable for applying force.

Results

Offline Dynamics Prediction

RoboPack demonstrated superior predictive accuracy compared to baselines, including:

RoboPack (no tactile): Merely using visual data was less effective.
RoboCook + tactile: Directly using both visual and tactile observations as states proved suboptimal due to the complexities of predicting tactile readings.
Physics-based Simulators: These were limited by the sim-to-real gap, especially when handling the fine details of compliant interactions.

Real-World Planning Performance

RoboPack outperformed baselines in both precision (measured by reduced MSE to goal states) and efficacy (measured by success rates and fewer execution steps), demonstrating its capacity to handle objects with unknown physical properties efficiently. Dense packing results further validated RoboPack’s capabilities in handling diverse object geometries and properties.

Learned Physics Parameters

Analyzing the latent physics vectors revealed that these vectors effectively encapsulated object physical properties, such as mass distributions. As interaction histories grew, these latent vectors became better clustered by object type, significantly enhancing the predictive model’s accuracy.

Implications and Future Work

Practical Implications

RoboPack's integration of tactile feedback and visual observations provides a robust framework for high-precision manipulation tasks in cluttered and occluded environments. This capability is crucial for applications in warehousing, logistics, and any domain requiring automated handling of objects in dense settings.

Theoretical Implications

The use of latent physics vectors and recurrent GNNs opens avenues for further research in tactile-informed dynamics modeling. The success of RoboPack suggests that extending these methods to other sensory modalities could further enhance robotic autonomy and adaptability, especially in unstructured environments.

Speculations on Future Developments

Future directions might explore enhanced high-fidelity particle modeling for subtle object deformations, more sophisticated trajectory optimization methods, and integrating broader physical priors. Additionally, adapting RoboPack to handle dynamic environments and interactive objects could further push the boundaries of autonomous robotic manipulation.

Conclusion

RoboPack represents a significant step in robotic interaction, efficiently integrating tactile and visual sensing for dynamic predictions and manipulation tasks. While demonstrating high performance in controlled experiments, its principles and methodologies promise broader applications and further research potential in tactile-enhanced robotic systems.

PDF Markdown

Related Papers

Tweets

https://twitter.com/BoAi0110/status/1808433578067288490

https://twitter.com/_vztu/status/1808537257336582396