Papers

Topics

Authors

Recent

View all

Gemini 2.5 Flash

98 tokens/sec

GPT-4o

8 tokens/sec

Gemini 2.5 Pro Pro

47 tokens/sec

o3 Pro

5 tokens/sec

GPT-4.1 Pro

38 tokens/sec

DeepSeek R1 via Azure Pro

28 tokens/sec

2000 character limit reached

Visual Whole-Body Control for Legged Loco-Manipulation (2403.16967v5)

Published 25 Mar 2024 in cs.RO, cs.CV, and cs.LG

Abstract: We study the problem of mobile manipulation using legged robots equipped with an arm, namely legged loco-manipulation. The robot legs, while usually utilized for mobility, offer an opportunity to amplify the manipulation capabilities by conducting whole-body control. That is, the robot can control the legs and the arm at the same time to extend its workspace. We propose a framework that can conduct the whole-body control autonomously with visual observations. Our approach, namely Visual Whole-Body Control(VBC), is composed of a low-level policy using all degrees of freedom to track the body velocities along with the end-effector position, and a high-level policy proposing the velocities and end-effector position based on visual inputs. We train both levels of policies in simulation and perform Sim2Real transfer for real robot deployment. We perform extensive experiments and show significant improvements over baselines in picking up diverse objects in different configurations (heights, locations, orientations) and environments.

References (78)

Citations (24)

View on Semantic Scholar

Summary

The paper introduces a hierarchical visual whole-body control framework that integrates low-level reinforcement learning and high-level visuomotor planning for legged loco-manipulation.
It employs a two-tiered policy structure combining a low-level control with Online Adaptation and a high-level, teacher-guided policy to facilitate seamless sim-to-real transfer.
Real-world deployment on a Unitree B1 quadruped with a robotic arm demonstrates robust performance in dynamic object handling and navigation across varied terrains.

Visual Whole-Body Control for Legged Loco-Manipulation: Expanding the Horizons of Mobile Robots

Introduction to Visual Whole-Body Control (VBC)

In the pursuit of enhancing robotic mobility and manipulation capabilities beyond conventional limitations, Visual Whole-Body Control (VBC) emerges as a comprehensive framework developed by researchers from UC San Diego, Shanghai Jiao Tong University, and Fudan University. VBC is engineered to address the intricacies of mobile manipulation using legged robots equipped with arms, enabling these machines to handle objects across diverse environments autonomously, guided solely by visual inputs.

The Framework

VBC introduces a two-tiered architecture consisting of a low-level policy that manages all degrees of freedom for locomotion and manipulation and a high-level policy that proposes end-effector positions based on segmented depth images. Notably, this design allows for the seamless execution of actions in various settings without the need for task-specific fine-tuning, showcasing the framework's adaptability and efficiency.

Low-Level Policy

The low-level policy within VBC is meticulously engineered to track end-effector poses and robot body velocities, enabling the robot to accomplish whole-body behaviors across different terrains. This policy utilizes reinforcement learning (RL) techniques to adapt to given goals, thereby allowing the robot to interact with its environment robustly.

Command and Observation: The policy receives specific commands, including end-effector position, orientation, and desired velocities. Observations fed into the policy encapsulate the robot's proprioceptive states and are enriched with environment extrinsics, fostering a dynamic response mechanism.
Adaptation and Control: A significant innovation in the low-level policy is the incorporation of Regularized Online Adaptation (ROA), which facilitates sim-to-real transfer by accounting for environmental discrepancies such as terrain type and robot mass variation.

High-Level Task-Planning Policy

The high-level policy operates on visual observations to navigate and manipulate objects with precision. Given the challenges associated with high-frequency control through direct RL from raw visual data, the approach instead distills guidance from a privileged teacher policy, which possesses access to object shape and pose details, into a visuomotor student policy.

Privileged Teacher Policy: The privileged teacher policy, trained via RL, leverages object shape features to refine the manipulation strategy, demonstrating the importance of detailed object understanding in achieving successful interaction.
Visual-Motor Coordination: The visuomotor student policy, distilled from the teacher policy, derives commands from segmented depth images, illustrating the critical role of visual data in adapting to real-world conditions.

Real-World Deployment and Implications

VBC's practical application is manifest in its deployment on a Unitree B1 quadruped robot, equipped with a Unitree Z1 robotic arm. The system demonstrates remarkable proficiency in picking up a variety of objects across different heights and surfaces, heralding a new era of adaptability and autonomy in robotic systems.

Sim-To-Real Transfer: Detailed measures, including domain randomization and depth image processing, ensure the seamless transition of the VBC framework from simulated environments to real-world scenarios, underscoring the effectiveness of the training methodologies.
System Design and Challenges: Adjustments to the hardware and software architecture accommodate various operational frequencies, highlighting the consideration for systemic harmony in ensuring efficient task execution.

Conclusion and Future Directions

VBC presents a holistic strategy for enhancing the capabilities of legged robots in navigating and manipulating objects in complex environments, driven by visual inputs. The framework's success in sim-to-real transfer, combined with its hierarchical design, sets a strong foundation for further research focused on improving hardware components and simplification processes to overcome current limitations. Future developments may concentrate on refining visual input quality, enhancing gripper functionality, and streamlining system operations, promising an expansive trajectory for the evolution of mobile robotic manipulation.

PDF Markdown

GitHub

Visual Whole-Body Loco-Manipulation

Tweets

https://twitter.com/xiaolonw/status/1772667935850635370

https://twitter.com/taziku_co/status/1772772040002641958

https://twitter.com/OWW/status/1790871663736701391