Emergent Mind

Learning Visuotactile Skills with Two Multifingered Hands

(2404.16823)
Published Apr 25, 2024 in cs.RO , cs.AI , cs.CV , and cs.LG

Abstract

Aiming to replicate human-like dexterity, perceptual experiences, and motion patterns, we explore learning from human demonstrations using a bimanual system with multifingered hands and visuotactile data. Two significant challenges exist: the lack of an affordable and accessible teleoperation system suitable for a dual-arm setup with multifingered hands, and the scarcity of multifingered hand hardware equipped with touch sensing. To tackle the first challenge, we develop HATO, a low-cost hands-arms teleoperation system that leverages off-the-shelf electronics, complemented with a software suite that enables efficient data collection; the comprehensive software suite also supports multimodal data processing, scalable policy learning, and smooth policy deployment. To tackle the latter challenge, we introduce a novel hardware adaptation by repurposing two prosthetic hands equipped with touch sensors for research. Using visuotactile data collected from our system, we learn skills to complete long-horizon, high-precision tasks which are difficult to achieve without multifingered dexterity and touch feedback. Furthermore, we empirically investigate the effects of dataset size, sensing modality, and visual input preprocessing on policy learning. Our results mark a promising step forward in bimanual multifingered manipulation from visuotactile data. Videos, code, and datasets can be found at https://toruowo.github.io/hato/ .

Learned skills in tasks needing bimanual dexterity, including handovers, stacking, pouring, and serving.

Overview

  • The paper introduces a bimanual teleoperation system called HATO, utilizing multifingered hands to enhance robotic dexterity through the integration of visual and tactile data.

  • HATO system employs UR5e robot arms and customized prosthetic hands equipped with tactile sensors, controlled via Meta Quest 2 VR controllers for intuitive human-like manipulation.

  • Experiments demonstrated successful complex task executions, underlying the importance of combined sensory inputs for advanced robotic handling and policy learning.

Exploring Bimanual Multifingered Manipulation Using Visuotactile Data

Introduction

In the pursuit of enhancing robotic dexterity, this study introduces a unique bimanual system integrated with multifinger hands, which leverages both visual and tactile data. Addressing gaps in affordable teleoperation systems and the limited availability of multifingered hands equipped with tactile sensors, the research develops a novel teleoperation system named HATO. This system utilizes commercial VR hardware for efficient data collection and policy learning, aiming to emulate complex human-like manipulative skills.

System Development and Challenges

The work outlines two primary innovations: the HATO system and the adaptation of prosthetic hands for detailed tactile sensing.

HATO: Hands-Arms Tele-Operation

  • Hardware Utilization: The system incorporates two UR5e robot arms and repurposed prosthetic hands, each equipped with detailed tactile sensors.
  • Control Scheme: Utilizes Meta Quest 2 controllers, mapping VR controller motions to robotic arm movements and specific button interactions to hand joint manipulations. This setup allows intuitive control, catering to complex task requirements.

Multifingered Hands

  • Hand Design: Originally prosthetic devices, these hands are adapted with custom PCBs to facilitate research use, offering extensive touch sensitivity crucial for handling intricate tasks.

Methodology and Data Handling

The research team collected multimodal data using a comprehensive teleoperation setup, capturing precise robotic manipulations across various tasks.

Data Collection Process

  • Diverse sensory inputs including proprioception, touch, and visual data were synchronized and recorded at a robust rate, ensuring comprehensive coverage of each manipulation aspect.

Policy Learning

  • Using a diffusion-based approach to model action sequences from the multimodal dataset. This method allowed the trained policies to predict manipulations with a focus on mimicking human-like dexterity and responsiveness.

Experimental Results and Discussion

The experiments involved four complex bimanual tasks including slippery object handover and intricate tool-based tasks like steak serving. These tasks tested the system’s ability to handle objects of varying textures, weights, and complexities.

Task Performance

  • The system demonstrated high success rates across most tasks, particularly highlighting the capabilities in adaptive grasping and precise manipulation.

Impact of Sensory Modalities

  • Empirical evaluations showed that the combination of touch and vision was instrumental in achieving effective learning outcomes and task robustness, emphasizing the importance of integrated sensory inputs for comprehensive policy learning.

Conclusions and Future Work

The study verifies the effectiveness of a low-cost, multifingered, bimanual system in executing dexterous tasks that approach human-like precision. It opens future avenues for incorporating haptic feedback to enrich interaction realism and enhancing generalizability across more diverse settings.

The researchers advocate for the continuance of this innovative approach, suggesting potential in expanding the capabilities of robotic systems to execute tasks requiring nuanced human-like dexterity and interaction. The open-source release of the hardware and software platforms used in this research aims to foster further exploration and collaboration within the field.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.