Emergent Mind

Abstract

This work presents HiLMa-Res, a hierarchical framework leveraging reinforcement learning to tackle manipulation tasks while performing continuous locomotion using quadrupedal robots. Unlike most previous efforts that focus on solving a specific task, HiLMa-Res is designed to be general for various loco-manipulation tasks that require quadrupedal robots to maintain sustained mobility. The novel design of this framework tackles the challenges of integrating continuous locomotion control and manipulation using legs. It develops an operational space locomotion controller that can track arbitrary robot end-effector (toe) trajectories while walking at different velocities. This controller is designed to be general to different downstream tasks, and therefore, can be utilized in high-level manipulation planning policy to address specific tasks. To demonstrate the versatility of this framework, we utilize HiLMa-Res to tackle several challenging loco-manipulation tasks using a quadrupedal robot in the real world. These tasks span from leveraging state-based policy to vision-based policy, from training purely from the simulation data to learning from real-world data. In these tasks, HiLMa-Res shows better performance than other methods.

HiLMa-Res framework: hierarchical training of locomotion controller and task-specific manipulation planner for loco-manipulation tasks.

Overview

  • HiLMa-Res introduces a hierarchical RL framework for quadrupedal robots, combining task-independent locomotion control with task-specific manipulation planners, facilitating versatile loco-manipulation tasks.

  • The framework includes a novel operational space locomotion controller and utilizes Bézier curves for flexible trajectory adjustments, enabling both state-based and vision-based observation inputs for effective real-world performance.

  • HiLMa-Res demonstrates superior performance in tasks like ball dribbling and load navigation, compared to existing methods, and shows promising results in both simulation and real-world environments.

Overview of HiLMa-Res Framework for Quadrupedal Locomotion and Manipulation

The paper "HiLMa-Res: A General Hierarchical Framework via Residual RL for Combining Quadrupedal Locomotion and Manipulation" presents a hierarchical framework named HiLMa-Res. The framework employs reinforcement learning (RL) to address loco-manipulation tasks using quadrupedal robots, facilitating continuous locomotion alongside manipulation capabilities.

Key Contributions

  1. Hierarchical Reinforcement Learning: HiLMa-Res introduces a hierarchical RL approach combining a task-independent locomotion controller with task-specific manipulation planners. This bifurcation allows a single framework to be utilized across various loco-manipulation tasks.
  2. Operational Space Locomotion Controller: The framework leverages a novel operational space locomotion controller capable of tracking arbitrary end-effector (toe) trajectories. This controller supports continuous mobility and is generalizable across different downstream tasks.
  3. Task-Specific Planners: HiLMa-Res employs high-level manipulation planners to address specific tasks by specifying residual trajectories and base commands for the locomotion controller. These planners utilize either state-based or vision-based observation inputs.
  4. Evaluation on Multiple Tasks: The versatility of HiLMa-Res is demonstrated through its application to tasks like ball dribbling, stepping over stones, and navigating loads. The framework shows superior performance compared to state-of-the-art methods in these tasks.
  5. Efficient Real-World Learning: The framework's structure facilitates efficient real-world learning. Particularly in the load navigation task, the HiLMa-Res framework achieves effective and efficient policy learning via RLPD, a technique for real-world reinforcement learning from pre-collected data.

Technical Details

Task-Independent Quadrupedal Locomotion Control

The locomotion controller in HiLMa-Res is designed to be broadly applicable to multiple downstream loco-manipulation tasks. It combines nominal trajectories from a Central Pattern Generator (CPG) and residual trajectories represented by Bézier curves.

  1. CPG-Based Nominal Trajectories: The CPG generates periodic swing foot trajectories parameterized by desired base velocities and turning yaw rates, facilitating operational space planning.
  2. Bézier Residual Trajectories: Bézier curves add flexibility to the nominal trajectories, allowing for fine adjustments of the swing legs while maintaining gait stability.
  3. Control Policy: This policy is trained in a simulated environment using PPO, and its goal is to track end-effector trajectories and base commands while executing a trotting gait.

Task-Specific Manipulation Planning

The task-specific planners in HiLMa-Res use RL to specify action goals for the locomotion controller, leveraging both state-based and vision-based observations.

  1. State-Based Policies: Applied in tasks such as ball dribbling and load navigation, state-based policies utilize proprioceptive feedback and object positions to generate commands.
  2. Vision-Based Policies: Used in tasks like stepping over stones, these policies rely on raw depth vision inputs from onboard cameras. This allows the robot to adjust gait trajectories in real-time to avoid obstacles.

Training in Simulation and Real World

The HiLMa-Res framework supports efficient training both in simulation and the real world:

  1. Simulation Training: Policies are initially trained using diverse trajectory inputs and dynamics randomization in simulation environments. This builds robust controllers capable of zero-shot transfer to real robots.
  2. Real-World Fine-Tuning: For tasks requiring precise interaction with the environment (e.g., load navigation), policies are fine-tuned using real-world data collection techniques like RLPD.

Experimental Results

Load Navigation Task

HiLMa-Res was benchmarked against end-to-end methods such as Reward Shaping, AMP, and Motion Tracking. It achieved a 100% success rate in simulation-based tests and an 80% success rate in initial real-world trials. Upon further training with real-world data, the success rate improved to 100%, showcasing efficient and reliable adaptation.

Ball Dribbling and Stepping Over Stones

In the ball dribbling task, the robot successfully performed sharp U-turns in narrow spaces. For stepping over stones, the vision-based policy allowed the robot to navigate cluttered environments, avoiding obstacles with a high success rate of 87.5%.

Implications and Future Directions

The HiLMa-Res framework addresses both practical and theoretical challenges in loco-manipulation tasks, demonstrating its versatility and efficiency. Future research could explore extending this framework to more complex tasks and different robot morphologies, including humanoids. Additionally, incorporating more sophisticated environmental feedback mechanisms could further enhance real-world performance and adaptability.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.