Emergent Mind

Visual Whole-Body Control for Legged Loco-Manipulation

(2403.16967)
Published Mar 25, 2024 in cs.RO , cs.CV , and cs.LG

Abstract

We study the problem of mobile manipulation using legged robots equipped with an arm, namely legged loco-manipulation. The robot legs, while usually utilized for mobility, offer an opportunity to amplify the manipulation capabilities by conducting whole-body control. That is, the robot can control the legs and the arm at the same time to extend its workspace. We propose a framework that can conduct the whole-body control autonomously with visual observations. Our approach, namely Visual Whole-Body Control(VBC), is composed of a low-level policy using all degrees of freedom to track the body velocities along with the end-effector position, and a high-level policy proposing the velocities and end-effector position based on visual inputs. We train both levels of policies in simulation and perform Sim2Real transfer for real robot deployment. We perform extensive experiments and show significant improvements over baselines in picking up diverse objects in different configurations (heights, locations, orientations) and environments.

VBC framework separately trains low-level control and high-level planning policies for whole-body manipulation.

Overview

  • Visual Whole-Body Control (VBC) is a framework for enhancing robotic autonomy in mobility and manipulation, utilizing legged robots with arms based solely on visual inputs.

  • VBC comprises a low-level policy for locomotion and manipulation control, and a high-level policy for task planning, both designed to work in diverse environments without task-specific adjustments.

  • The framework employs reinforcement learning, regularized online adaptation for sim-to-real transfer, and distillation of guidance from a privileged teacher policy for practicable real-world application.

  • Demonstrated on a Unitree B1 quadruped robot with a Unitree Z1 arm, VBC shows promise for advanced robotic systems capable of complex navigation and manipulation tasks, pushing forward the boundaries of mobile robotics.

Visual Whole-Body Control for Legged Loco-Manipulation: Expanding the Horizons of Mobile Robots

Introduction to Visual Whole-Body Control (VBC)

In the pursuit of enhancing robotic mobility and manipulation capabilities beyond conventional limitations, Visual Whole-Body Control (VBC) emerges as a comprehensive framework developed by researchers from UC San Diego, Shanghai Jiao Tong University, and Fudan University. VBC is engineered to address the intricacies of mobile manipulation using legged robots equipped with arms, enabling these machines to handle objects across diverse environments autonomously, guided solely by visual inputs.

The Framework

VBC introduces a two-tiered architecture consisting of a low-level policy that manages all degrees of freedom for locomotion and manipulation and a high-level policy that proposes end-effector positions based on segmented depth images. Notably, this design allows for the seamless execution of actions in various settings without the need for task-specific fine-tuning, showcasing the framework's adaptability and efficiency.

Low-Level Policy

The low-level policy within VBC is meticulously engineered to track end-effector poses and robot body velocities, enabling the robot to accomplish whole-body behaviors across different terrains. This policy utilizes reinforcement learning (RL) techniques to adapt to given goals, thereby allowing the robot to interact with its environment robustly.

  • Command and Observation: The policy receives specific commands, including end-effector position, orientation, and desired velocities. Observations fed into the policy encapsulate the robot's proprioceptive states and are enriched with environment extrinsics, fostering a dynamic response mechanism.
  • Adaptation and Control: A significant innovation in the low-level policy is the incorporation of Regularized Online Adaptation (ROA), which facilitates sim-to-real transfer by accounting for environmental discrepancies such as terrain type and robot mass variation.

High-Level Task-Planning Policy

The high-level policy operates on visual observations to navigate and manipulate objects with precision. Given the challenges associated with high-frequency control through direct RL from raw visual data, the approach instead distills guidance from a privileged teacher policy, which possesses access to object shape and pose details, into a visuomotor student policy.

  • Privileged Teacher Policy: The privileged teacher policy, trained via RL, leverages object shape features to refine the manipulation strategy, demonstrating the importance of detailed object understanding in achieving successful interaction.
  • Visual-Motor Coordination: The visuomotor student policy, distilled from the teacher policy, derives commands from segmented depth images, illustrating the critical role of visual data in adapting to real-world conditions.

Real-World Deployment and Implications

VBC's practical application is manifest in its deployment on a Unitree B1 quadruped robot, equipped with a Unitree Z1 robotic arm. The system demonstrates remarkable proficiency in picking up a variety of objects across different heights and surfaces, heralding a new era of adaptability and autonomy in robotic systems.

  • Sim-To-Real Transfer: Detailed measures, including domain randomization and depth image processing, ensure the seamless transition of the VBC framework from simulated environments to real-world scenarios, underscoring the effectiveness of the training methodologies.
  • System Design and Challenges: Adjustments to the hardware and software architecture accommodate various operational frequencies, highlighting the consideration for systemic harmony in ensuring efficient task execution.

Conclusion and Future Directions

VBC presents a holistic strategy for enhancing the capabilities of legged robots in navigating and manipulating objects in complex environments, driven by visual inputs. The framework's success in sim-to-real transfer, combined with its hierarchical design, sets a strong foundation for further research focused on improving hardware components and simplification processes to overcome current limitations. Future developments may concentrate on refining visual input quality, enhancing gripper functionality, and streamlining system operations, promising an expansive trajectory for the evolution of mobile robotic manipulation.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.