Emergent Mind

Robot Synesthesia: In-Hand Manipulation with Visuotactile Sensing

(2312.01853)
Published Dec 4, 2023 in cs.RO , cs.CV , and cs.LG

Abstract

Executing contact-rich manipulation tasks necessitates the fusion of tactile and visual feedback. However, the distinct nature of these modalities poses significant challenges. In this paper, we introduce a system that leverages visual and tactile sensory inputs to enable dexterous in-hand manipulation. Specifically, we propose Robot Synesthesia, a novel point cloud-based tactile representation inspired by human tactile-visual synesthesia. This approach allows for the simultaneous and seamless integration of both sensory inputs, offering richer spatial information and facilitating better reasoning about robot actions. The method, trained in a simulated environment and then deployed to a real robot, is applicable to various in-hand object rotation tasks. Comprehensive ablations are performed on how the integration of vision and touch can improve reinforcement learning and Sim2Real performance. Our project page is available at https://yingyuan0414.github.io/visuotactile/ .

Overview

  • The paper presents Robot Synesthesia, a method to fuse tactile and visual data for robotic in-hand manipulation.

  • Robot Synesthesia represents tactile data as a point cloud, integrating it with a visual point cloud for a cohesive sensory experience.

  • A two-stage training pipeline with reinforcement learning is used, involving a teacher policy and a student policy that processes integrated sensory data with a PointNet encoder.

  • Experiments demonstrate the system's effectiveness in simulated in-hand rotation tasks and its ability to transfer to real-world applications.

  • The approach shows potential for enhancing robotic manipulation abilities and for adaptation to complex environments.

Introduction

Executing contact-rich manipulation tasks with robotic systems requires a nuanced integration of sensory inputs, specifically vision and touch. The fusion of these modalities is complicated due to their fundamentally different natures. Whereas tactile information tends to be sparse and low-dimensional, providing localized contact data, visual feedback is usually dense and high-dimensional, offering a wide array of environmental cues. A significant challenge lies in not only processing these dissimilar data streams effectively but also in integrating them to enable a robot to make informed and dexterous manipulations.

Visuotactile Representation

To address the integration challenge, a novel approach named Robot Synesthesia is introduced. It draws inspiration from human tactile-visual synesthesia, wherein certain individuals can perceive colors when they touch objects. Robot Synesthesia represents tactile data from Force-Sensing Resistor (FSR) sensors as a point cloud, which is then combined with a camera-generated point cloud into a singular three-dimensional space. This method accurately preserves the spatial relationships between the robot's links, sensors, and objects, melding vision and touch into a cohesive sensory experience. Tactile point clouds are easily generated in both simulated and real-world settings, offering advantages for in-hand manipulation tasks, such as lowering the Sim2Real transfer gap and enhancing spatial reasoning.

Training Pipeline

The training process involves two stages. The first stage involves training a 'teacher' policy within a simulation, which uses reinforcement learning (RL) and has access to precise state information including the robot's joint positions and object pose. This teacher model provides high-level guidance for the 'student' policy, which uses the tactile and visual point cloud data. The student policy is initially subjected to Behavior Cloning from the teacher policy's dataset, followed by Dataset Aggregation for refinement. The employed PointNet encoder with visual and tactile inputs underpins the student policy's architecture, allowing the system to process the integrated sensory data.

Experimentation and Outcomes

The system's capabilities are demonstrated through a series of benchmark problems involving in-hand object rotation. The tasks range from single object manipulation to more complex scenarios, such as rotating double balls concurrently. Experiments conducted in a simulated environment and then transferred to a real robot hand showcase the method's applicability to various in-hand rotation tasks without additional real-world data. Comprehensively, the system achieves potent Sim2Real performance, with abilities extending to generalization from trained geometries to novel, real-world objects.

The research reveals that the presented Robot Synesthesia approach affords a significant leap toward sophisticated robotic manipulation. The integrated visuotactile methodology facilitates a higher level of dexterous in-hand manipulation, robust to occlusions and variations in object shape and size. Moreover, the findings indicate a promising path for the progression of robotic interaction with real-world environments, with applications possibly extending into more complex domains where tactile and visual feedback is paramount.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.