HiLMa-Res: A General Hierarchical Framework via Residual RL for Combining Quadrupedal Locomotion and Manipulation (2407.06584v1)

Published 9 Jul 2024 in cs.RO

Abstract: This work presents HiLMa-Res, a hierarchical framework leveraging reinforcement learning to tackle manipulation tasks while performing continuous locomotion using quadrupedal robots. Unlike most previous efforts that focus on solving a specific task, HiLMa-Res is designed to be general for various loco-manipulation tasks that require quadrupedal robots to maintain sustained mobility. The novel design of this framework tackles the challenges of integrating continuous locomotion control and manipulation using legs. It develops an operational space locomotion controller that can track arbitrary robot end-effector (toe) trajectories while walking at different velocities. This controller is designed to be general to different downstream tasks, and therefore, can be utilized in high-level manipulation planning policy to address specific tasks. To demonstrate the versatility of this framework, we utilize HiLMa-Res to tackle several challenging loco-manipulation tasks using a quadrupedal robot in the real world. These tasks span from leveraging state-based policy to vision-based policy, from training purely from the simulation data to learning from real-world data. In these tasks, HiLMa-Res shows better performance than other methods.

Summary

The paper presents a hierarchical RL framework that decouples locomotion control from manipulation planning to achieve versatile loco-manipulation tasks.
It introduces an operational space controller that tracks arbitrary end-effector trajectories using CPG-based nominal and Bézier residual trajectories.
The framework achieves high success rates—up to 100% in simulations and real-world tasks—across ball dribbling, stepping over stones, and load navigation.

Overview of HiLMa-Res Framework for Quadrupedal Locomotion and Manipulation

The paper "HiLMa-Res: A General Hierarchical Framework via Residual RL for Combining Quadrupedal Locomotion and Manipulation" presents a hierarchical framework named HiLMa-Res. The framework employs reinforcement learning (RL) to address loco-manipulation tasks using quadrupedal robots, facilitating continuous locomotion alongside manipulation capabilities.

Key Contributions

Hierarchical Reinforcement Learning: HiLMa-Res introduces a hierarchical RL approach combining a task-independent locomotion controller with task-specific manipulation planners. This bifurcation allows a single framework to be utilized across various loco-manipulation tasks.
Operational Space Locomotion Controller: The framework leverages a novel operational space locomotion controller capable of tracking arbitrary end-effector (toe) trajectories. This controller supports continuous mobility and is generalizable across different downstream tasks.
Task-Specific Planners: HiLMa-Res employs high-level manipulation planners to address specific tasks by specifying residual trajectories and base commands for the locomotion controller. These planners utilize either state-based or vision-based observation inputs.
Evaluation on Multiple Tasks: The versatility of HiLMa-Res is demonstrated through its application to tasks like ball dribbling, stepping over stones, and navigating loads. The framework shows superior performance compared to state-of-the-art methods in these tasks.
Efficient Real-World Learning: The framework's structure facilitates efficient real-world learning. Particularly in the load navigation task, the HiLMa-Res framework achieves effective and efficient policy learning via RLPD, a technique for real-world reinforcement learning from pre-collected data.

Technical Details

Task-Independent Quadrupedal Locomotion Control

The locomotion controller in HiLMa-Res is designed to be broadly applicable to multiple downstream loco-manipulation tasks. It combines nominal trajectories from a Central Pattern Generator (CPG) and residual trajectories represented by Bézier curves.

CPG-Based Nominal Trajectories: The CPG generates periodic swing foot trajectories parameterized by desired base velocities and turning yaw rates, facilitating operational space planning.
Bézier Residual Trajectories: Bézier curves add flexibility to the nominal trajectories, allowing for fine adjustments of the swing legs while maintaining gait stability.
Control Policy: This policy is trained in a simulated environment using PPO, and its goal is to track end-effector trajectories and base commands while executing a trotting gait.

Task-Specific Manipulation Planning

The task-specific planners in HiLMa-Res use RL to specify action goals for the locomotion controller, leveraging both state-based and vision-based observations.

State-Based Policies: Applied in tasks such as ball dribbling and load navigation, state-based policies utilize proprioceptive feedback and object positions to generate commands.
Vision-Based Policies: Used in tasks like stepping over stones, these policies rely on raw depth vision inputs from onboard cameras. This allows the robot to adjust gait trajectories in real-time to avoid obstacles.

Training in Simulation and Real World

The HiLMa-Res framework supports efficient training both in simulation and the real world:

Simulation Training: Policies are initially trained using diverse trajectory inputs and dynamics randomization in simulation environments. This builds robust controllers capable of zero-shot transfer to real robots.
Real-World Fine-Tuning: For tasks requiring precise interaction with the environment (e.g., load navigation), policies are fine-tuned using real-world data collection techniques like RLPD.

Experimental Results

HiLMa-Res was benchmarked against end-to-end methods such as Reward Shaping, AMP, and Motion Tracking. It achieved a 100% success rate in simulation-based tests and an 80% success rate in initial real-world trials. Upon further training with real-world data, the success rate improved to 100%, showcasing efficient and reliable adaptation.

Ball Dribbling and Stepping Over Stones

In the ball dribbling task, the robot successfully performed sharp U-turns in narrow spaces. For stepping over stones, the vision-based policy allowed the robot to navigate cluttered environments, avoiding obstacles with a high success rate of 87.5%.

Implications and Future Directions

The HiLMa-Res framework addresses both practical and theoretical challenges in loco-manipulation tasks, demonstrating its versatility and efficiency. Future research could explore extending this framework to more complex tasks and different robot morphologies, including humanoids. Additionally, incorporating more sophisticated environmental feedback mechanisms could further enhance real-world performance and adaptability.

PDF Markdown

Related Papers

Tweets

https://twitter.com/ZhongyuLi4/status/1844491388563619927

https://twitter.com/RoboReading/status/1812089181755109858