Emergent Mind

Abstract

Numerous locomotion controllers have been designed based on Reinforcement Learning (RL) to facilitate blind quadrupedal locomotion traversing challenging terrains. Nevertheless, locomotion control is still a challenging task for quadruped robots traversing diverse terrains amidst unforeseen disturbances. Recently, privileged learning has been employed to learn reliable and robust quadrupedal locomotion over various terrains based on a teacher-student architecture. However, its one-encoder structure is not adequate in addressing external force perturbations. The student policy would experience inevitable performance degradation due to the feature embedding discrepancy between the feature encoder of the teacher policy and the one of the student policy. Hence, this paper presents a privileged learning framework with multiple feature encoders and a residual policy network for robust and reliable quadruped locomotion subject to various external perturbations. The multi-encoder structure can decouple latent features from different privileged information, ultimately leading to enhanced performance of the learned policy in terms of robustness, stability, and reliability. The efficiency of the proposed feature encoding module is analyzed in depth using extensive simulation data. The introduction of the residual policy network helps mitigate the performance degradation experienced by the student policy that attempts to clone the behaviors of a teacher policy. The proposed framework is evaluated on a Unitree GO1 robot, showcasing its performance enhancement over the state-of-the-art privileged learning algorithm through extensive experiments conducted on diverse terrains. Ablation studies are conducted to illustrate the efficiency of the residual policy network.

PA-LOCO fuses a teacher-student framework with residual networks and multiple feature encoders over three phases.

Overview

  • PA-LOCO is a framework for robust quadrupedal locomotion using reinforcement learning (RL) to manage various terrains and unexpected disturbances without dedicated force sensors.

  • Key innovations include a residual policy network to enhance student policy performance, a multi-encoder structure to decouple and stabilize various latent features, and extensive simulations validating its adaptability under external forces.

  • Experiments show PA-LOCO's superior recovery and stability across challenging terrains, indicating its potential for practical deployment in exploration, rescue, and maintenance tasks.

PA-LOCO: Learning Perturbation-Adaptive Locomotion for Quadruped Robots

This paper introduces PA-LOCO, a novel framework designed for robust and adaptive quadrupedal locomotion amidst diverse terrains and unforeseen disturbances. Leveraging reinforcement learning (RL), this approach addresses critical challenges in achieving stable and reliable locomotion for quadruped robots under inconsistent and unpredictable external forces without the use of dedicated force sensors.

Methodology and Contributions

The authors propose a privileged learning framework based on a teacher-student architecture with multiple feature encoders and a residual policy network. The main innovations and contributions of the study are multi-faceted:

  1. Residual Policy Network: The residual policy network is introduced to mitigate the performance degradation typically encountered when transferring capabilities from a teacher policy to a student policy. The residual network enhances the student's policy performance in handling perturbations, specifically improving robustness and reducing recovery time.
  2. Multi-Encoder Structure: The privileged learning framework is augmented with multiple feature encoders. This architecture decouples latent features derived from various privileged information sources, such as external force perturbations, terrain profiles, and robot states. This decoupling reduces the mutual influence among different observations, leading to a more stable, robust, and reliable locomotion policy.
  3. Latent Feature Embedding: The effectiveness of the latent feature embedding is rigorously analyzed using extensive simulation data. The proposed multi-encoder structure is shown to significantly improve the discernment of external forces with varying magnitudes and directions, thereby enhancing motion stability and adaptability.

Reinforcement Learning Framework

The base reinforcement learning approach employs the Proximal Policy Optimization (PPO) algorithm, chosen for its balance between performance and computational efficiency. The training utilizes domain randomization to bridge the sim-to-real gap, encompassing randomized dynamic parameters, external force perturbations, and varying sensor noise levels. Below are the primary components of the learning model:

  • Observations: The observation space includes proprioceptive sensor data, robot state information, and external force histories.
  • Actions: Actions consist of 12-dimensional reference joint angles for the quadruped robot, integrated into a low-level PD control framework.
  • Rewards: The reward function contains multiple terms to balance task-specific objectives (e.g., velocity tracking) and auxiliary goals (e.g., smooth and efficient motion).

Experimental Validation

The proposed PA-LOCO framework was validated both in simulation and through extensive physical experiments. Key results from the studies demonstrate that:

  • The residual policy network significantly reduces the lateral offset and recovery time after perturbation, achieving better performance compared to state-of-the-art methods.
  • The multi-encoder structure effectively distinguishes between different magnitudes and directions of external forces, leading to enhanced adaptive responses.
  • The overall locomotion policy ensures robust and stable motion across challenging terrains such as grass, slopes, and stairs, even under sudden lateral kicks.

Practical and Theoretical Implications

The introduction of multi-encoder structures in the context of privileged learning presents a significant step forward in adaptive locomotion for quadruped robots. The ability to handle perturbations and maintain stable locomotion without dedicated force sensors substantially broadens the operational capabilities of robotic systems in unstructured environments. Practically, this means more reliable and efficient deployment of quadruped robots in real-world scenarios ranging from exploration and rescue operations to routine maintenance tasks in varying terrains.

Theoretically, the paper contributes to the ongoing research in robot learning by presenting a robust integration of RL with advanced feature extraction and policy adaptation techniques. The efficacy of decoupling latent features underpins future avenues for research in modular policy architectures and the potential for deploying more complex behaviors.

Future Work

While the current framework demonstrates significant advances, future work could explore the implementation of attention mechanisms to dynamically weigh the importance of outputs from multiple encoders. Such mechanisms could further refine the adaptability and responsiveness of the locomotion policy. Additionally, real-world empirical evaluations under varied and more extreme conditions could provide deeper insights into the long-term stability and robustness of PA-LOCO.

In summary, PA-LOCO represents a comprehensive approach for robust quadruped locomotion, offering key insights and methodologies that push the boundaries of what can be achieved with learning-based control under uncertain and variable external perturbations.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.