Robot Synesthesia: In-Hand Manipulation with Visuotactile Sensing (2312.01853v3)

Published 4 Dec 2023 in cs.RO, cs.CV, and cs.LG

Abstract: Executing contact-rich manipulation tasks necessitates the fusion of tactile and visual feedback. However, the distinct nature of these modalities poses significant challenges. In this paper, we introduce a system that leverages visual and tactile sensory inputs to enable dexterous in-hand manipulation. Specifically, we propose Robot Synesthesia, a novel point cloud-based tactile representation inspired by human tactile-visual synesthesia. This approach allows for the simultaneous and seamless integration of both sensory inputs, offering richer spatial information and facilitating better reasoning about robot actions. The method, trained in a simulated environment and then deployed to a real robot, is applicable to various in-hand object rotation tasks. Comprehensive ablations are performed on how the integration of vision and touch can improve reinforcement learning and Sim2Real performance. Our project page is available at https://yingyuan0414.github.io/visuotactile/ .

References (66)

Citations (26)

View on Semantic Scholar

Summary

The paper introduces Robot Synesthesia, a method that fuses tactile and visual data into a unified 3D point cloud to enhance in-hand manipulation.
The methodology employs a two-stage training pipeline combining reinforcement learning and behavior cloning to effectively bridge the Sim2Real gap.
Experimental results demonstrate robust performance in both single and dual-object manipulation tasks, with strong generalization to novel real-world objects.

Introduction

Executing contact-rich manipulation tasks with robotic systems requires a nuanced integration of sensory inputs, specifically vision and touch. The fusion of these modalities is complicated due to their fundamentally different natures. Whereas tactile information tends to be sparse and low-dimensional, providing localized contact data, visual feedback is usually dense and high-dimensional, offering a wide array of environmental cues. A significant challenge lies in not only processing these dissimilar data streams effectively but also in integrating them to enable a robot to make informed and dexterous manipulations.

Visuotactile Representation

To address the integration challenge, a novel approach named Robot Synesthesia is introduced. It draws inspiration from human tactile-visual synesthesia, wherein certain individuals can perceive colors when they touch objects. Robot Synesthesia represents tactile data from Force-Sensing Resistor (FSR) sensors as a point cloud, which is then combined with a camera-generated point cloud into a singular three-dimensional space. This method accurately preserves the spatial relationships between the robot's links, sensors, and objects, melding vision and touch into a cohesive sensory experience. Tactile point clouds are easily generated in both simulated and real-world settings, offering advantages for in-hand manipulation tasks, such as lowering the Sim2Real transfer gap and enhancing spatial reasoning.

Training Pipeline

The training process involves two stages. The first stage involves training a 'teacher' policy within a simulation, which uses reinforcement learning (RL) and has access to precise state information including the robot's joint positions and object pose. This teacher model provides high-level guidance for the 'student' policy, which uses the tactile and visual point cloud data. The student policy is initially subjected to Behavior Cloning from the teacher policy's dataset, followed by Dataset Aggregation for refinement. The employed PointNet encoder with visual and tactile inputs underpins the student policy's architecture, allowing the system to process the integrated sensory data.

Experimentation and Outcomes

The system's capabilities are demonstrated through a series of benchmark problems involving in-hand object rotation. The tasks range from single object manipulation to more complex scenarios, such as rotating double balls concurrently. Experiments conducted in a simulated environment and then transferred to a real robot hand showcase the method's applicability to various in-hand rotation tasks without additional real-world data. Comprehensively, the system achieves potent Sim2Real performance, with abilities extending to generalization from trained geometries to novel, real-world objects.

The research reveals that the presented Robot Synesthesia approach affords a significant leap toward sophisticated robotic manipulation. The integrated visuotactile methodology facilitates a higher level of dexterous in-hand manipulation, robust to occlusions and variations in object shape and size. Moreover, the findings indicate a promising path for the progression of robotic interaction with real-world environments, with applications possibly extending into more complex domains where tactile and visual feedback is paramount.

PDF Markdown

GitHub

Robot Synesthesia

Tweets

https://twitter.com/RoboReading/status/1812107823238524999

https://twitter.com/taziku_co/status/1812074890289369194

https://twitter.com/OWW/status/1819101094523347363

https://twitter.com/_vztu/status/1819837776701063615