Emergent Mind

Learning H-Infinity Locomotion Control

(2404.14405)
Published Apr 22, 2024 in cs.RO

Abstract

Stable locomotion in precipitous environments is an essential capability of quadruped robots, demanding the ability to resist various external disturbances. However, recent learning-based policies only use basic domain randomization to improve the robustness of learned policies, which cannot guarantee that the robot has adequate disturbance resistance capabilities. In this paper, we propose to model the learning process as an adversarial interaction between the actor and a newly introduced disturber and ensure their optimization with $H{\infty}$ constraint. In contrast to the actor that maximizes the discounted overall reward, the disturber is responsible for generating effective external forces and is optimized by maximizing the error between the task reward and its oracle, i.e., "cost" in each iteration. To keep joint optimization between the actor and the disturber stable, our $H{\infty}$ constraint mandates the bound of ratio between the cost to the intensity of the external forces. Through reciprocal interaction throughout the training phase, the actor can acquire the capability to navigate increasingly complex physical disturbances. We verify the robustness of our approach on quadrupedal locomotion tasks with Unitree Aliengo robot, and also a more challenging task with Unitree A1 robot, where the quadruped is expected to perform locomotion merely on its hind legs as if it is a bipedal robot. The simulated quantitative results show improvement against baselines, demonstrating the effectiveness of the method and each design choice. On the other hand, real-robot experiments qualitatively exhibit how robust the policy is when interfering with various disturbances on various terrains, including stairs, high platforms, slopes, and slippery terrains. All code, checkpoints, and real-world deployment guidance will be made public.

Demonstration of a controller managing bipedal standing tasks under different disturbances.

Overview

  • This paper introduces a novel approach for controlling quadruped locomotion using H-Infinity control, emphasizing resilience against disturbances.

  • The method employs a dual optimization strategy with an 'actor' to perform tasks and a 'disturber' that introduces challenges, allowing the robot to handle unpredictable environmental factors effectively.

  • Proximal Policy Optimization (PPO) is utilized within a robust simulation framework to manage the adversarial dynamics created by the actor and disturber.

  • The approach demonstrates significant improvements in robot stability and adaptiveness across various terrains in both simulated and real-world testing environments, showing promise for future broader robotic applications.

Learning H-Infinity Locomotion Control for Quadruped Robots

Introduction

Robotics has been rapidly advancing, particularly in the realm of learning-based methods for controlling quadruped locomotion. These enhancements are primarily driven by leveraging large-scale parallel training environments supported by neural controllers allowing for robust movement over complex terrains. Yet, ensuring resilience against unforeseen disturbances remains a critical challenge, vital for real-world applications like disaster recovery or unstructured terrain navigation. Traditional solutions often utilize basic domain randomization techniques during training, which does not fully equip robots for variable real-world disturbances.

This paper presents an innovative approach by modeling the learning process under adversarial dynamics focusing on enhancing robustness through H-Infinity control. The unique aspect of this approach lies in its dual optimization strategy involving an "actor" to perform the task and a "disturber" to challenge task execution under controlled, escalating disturbance scenarios. This dynamic is further stabilized by the integration of $H_{\infty}$ constraints to maintain a balance between performance degradation and disturbance intensity.

Core Methodology

The system architecture pivots on a sophisticated interaction between the actor, trained to maximize overall rewards, and the disturber, optimized to maximize errors between expected task rewards and achieved outcomes. The learning mechanism adopts Proximal Policy Optimization (PPO) to handle this adversarial setup and is defined over a robust simulation framework using Isaac Gym.

H-Infinity Constraint Implementation

The $H_{\infty}$ constraint is a paramount part of this framework, ensuring a bounded ratio between the cost inflicted by the disturber and the intensity of the external forces applied. This methodological backbone not only enhances the disturbance handling capability of the learning model but also guarantees theoretical robustness bounds, underpinning stability across learning iterations.

Simulation and Real-World Testing Environments

Multiple test scenarios are laid out, ranging from continuous to sudden and high-intensity disruptions in simulated environments, further extending to complex physical terrains such as slippery slopes or uneven surfaces. The robot's performance showcased significant improvement in stability and adaptiveness over traditional methods, particularly under highly disruptive conditions meant to emulate real-world operational challenges.

Results and Observations

Quantitative advancements are noted across various testing scenarios, demonstrating superior task performance and disturbance handling by robots trained under the proposed $H{\infty}$ paradigm. Specifically, the use of an adaptive disturber under $H{\infty}$ constraints allowed for a nuanced calibration of disturbances that align with the robot's current state and learning progress, ultimately contributing to a more robust locomotion control.

In real-world tests, the deployment on robust Unitree models confirmed the practical applicability of the proposed approach, where the controlled policy effectively handled real-world disturbances across diverse terrains, showcasing both resilience and agility.

Conclusion and Future Work

The integration of H-Infinity control in the training of neural network-based controllers for robotic locomotion presents a significant step toward robust autonomous operation in dynamic and unpredictable environments. The success of these methods in simulation and real-world trials provides a promising outlook for future applications in various industrial and rescue operations. Further research might explore the extension of these principles to other robotic configurations, such as bipedal or aerial robots, potentially transforming broader areas of robotics where adaptability and resilience are critical.

This approach invites a deeper exploration into adaptive and resilient machine learning techniques that are not just theoretical in their robustness but proven under the physically demanding conditions that mimic the real world.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.