Emergent Mind

Learning Human-to-Humanoid Real-Time Whole-Body Teleoperation

(2403.04436)
Published Mar 7, 2024 in cs.RO , cs.AI , cs.LG , cs.SY , and eess.SY

Abstract

We present Human to Humanoid (H2O), a reinforcement learning (RL) based framework that enables real-time whole-body teleoperation of a full-sized humanoid robot with only an RGB camera. To create a large-scale retargeted motion dataset of human movements for humanoid robots, we propose a scalable "sim-to-data" process to filter and pick feasible motions using a privileged motion imitator. Afterwards, we train a robust real-time humanoid motion imitator in simulation using these refined motions and transfer it to the real humanoid robot in a zero-shot manner. We successfully achieve teleoperation of dynamic whole-body motions in real-world scenarios, including walking, back jumping, kicking, turning, waving, pushing, boxing, etc. To the best of our knowledge, this is the first demonstration to achieve learning-based real-time whole-body humanoid teleoperation.

Overview

  • The paper introduces Human to Humanoid ( extbackslash method), a novel framework for real-time whole-body teleoperation of humanoid robots using just an RGB camera.

  • extbackslash method employs reinforcement learning and a 'sim-to-data' process to refine human motion data for compatibility with humanoid constraints, achieving zero-shot transfer from simulation to real application.

  • The framework demonstrates the ability to have a humanoid robot mimic dynamic human motions with high fidelity in real time.

  • The successful application of extbackslash method suggests significant future advancements in humanoid robot capabilities and human-robot collaboration in complex tasks.

Learning Human-to-Humanoid Real-Time Whole-Body Teleoperation

Introduction

Humanoid robots, by virtue of their design, offer unique advantages for tasks that require human-like interaction with the environment. However, controlling these robots to replicate the wide range of human motions in real-time presents a substantial challenge. Conventional model-based approaches to humanoid teleoperation often require simplifications and are heavily reliant on external sensor setups, limiting their applicability in dynamic tasks. Recent advancements in reinforcement learning offer promising solutions, but the application of these techniques to real-world humanoid teleoperation, particularly at the whole-body level, remains largely unexplored.

This study presents Human to Humanoid (\method), a novel framework that enables the teleoperation of a full-sized humanoid robot using only an RGB camera. The approach is grounded in reinforcement learning, augmented with a "sim-to-data" process that refines a large-scale human motion dataset for feasibility with real-world humanoid constraints. Crucially, this system achieves zero-shot transfer from simulation to real-world application, demonstrating the robot's capability to mimic dynamic human motions such as walking, kicking, and complex gesturing in real time.

Methodology

Retargeting Human Motions

A key innovation in \method is its scalable approach to retargeting human motion for humanoid robots. This process involves adjusting large-scale human motion data to fit the physical constraints and capabilities of a humanoid. The study introduces a two-step retargeting process, initially adapting the human body model (SMPL) to match the humanoid’s structure, followed by a novel "sim-to-data" method. This method employs a privileged motion imitator to filter out infeasible motions from the retargeted dataset, resulting in a refined set of motions which the real-world humanoid can feasibly execute.

Real-Time Whole-Body Teleoperation Training

The study articulates a comprehensive training regimen for the humanoid robot, utilizing the retargeted and refined motion dataset. Key to this process is the formulation of an appropriate state space that captures essential motion details while remaining computationally tractable for real-time application. The framework incorporates advanced domain randomization techniques to ensure robustness and generalization of the control policy from simulation to real-world execution.

Evaluation and Results

The \method framework was rigorously tested in both simulated environments and real-world scenarios. In simulation, \method demonstrated superior performance in motion tracking accuracy and success rate over baseline approaches. Notably, the framework exhibited significant resilience to data reduction, maintaining high success rates even when trained on a substantially smaller motion dataset. In real-world tests, \method enabled a full-sized Unitree H1 humanoid robot to replicate a wide variety of dynamic human motions with high fidelity, signifying a substantial leap forward in humanoid teleoperation capabilities.

Implications and Future Directions

The success of \method in enabling advanced humanoid teleoperation has broad implications for the use of humanoid robots in environments that demand human-like dexterity and adaptability. Looking ahead, the study outlines potential avenues for further research, including improving the representation of motion goals, closing the embodiment gap between humans and robots, and advancements in human-robot interaction to enhance teleoperation efficiency and intuitiveness.

In summary, \method represents a significative advancement in the field of humanoid robotics, offering a robust and scalable solution for real-time whole-body teleoperation using only an RGB camera. This work not only extends the frontier of humanoid robot control but also paves the way for greater human-robot collaboration in complex real-world tasks.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.

YouTube