Feedback Control For Cassie With Deep Reinforcement Learning (1803.05580v2)

Published 15 Mar 2018 in cs.RO

Abstract: Bipedal locomotion skills are challenging to develop. Control strategies often use local linearization of the dynamics in conjunction with reduced-order abstractions to yield tractable solutions. In these model-based control strategies, the controller is often not fully aware of many details, including torque limits, joint limits, and other non-linearities that are necessarily excluded from the control computations for simplicity. Deep reinforcement learning (DRL) offers a promising model-free approach for controlling bipedal locomotion which can more fully exploit the dynamics. However, current results in the machine learning literature are often based on ad-hoc simulation models that are not based on corresponding hardware. Thus it remains unclear how well DRL will succeed on realizable bipedal robots. In this paper, we demonstrate the effectiveness of DRL using a realistic model of Cassie, a bipedal robot. By formulating a feedback control problem as finding the optimal policy for a Markov Decision Process, we are able to learn robust walking controllers that imitate a reference motion with DRL. Controllers for different walking speeds are learned by imitating simple time-scaled versions of the original reference motion. Controller robustness is demonstrated through several challenging tests, including sensory delay, walking blindly on irregular terrain and unexpected pushes at the pelvis. We also show we can interpolate between individual policies and that robustness can be improved with an interpolated policy.

Authors (5)

Zhaoming Xie (14 papers)
Glen Berseth (48 papers)
Patrick Clary (4 papers)
Jonathan Hurst (15 papers)
Michiel van de Panne (30 papers)

Citations (170)

View on Semantic Scholar

Summary

The paper demonstrates a model-free DRL approach that learns optimal walking policies for Cassie, achieving robust control beyond traditional model-based methods.
Using a Proximal Policy Optimization actor-critic algorithm, the approach trains controllers that handle sensor delays, perturbations, and uneven terrains, managing disturbances up to 0.15m.
Policy interpolation enables dynamic velocity modulation, underscoring the potential for DRL to advance bipedal robotic applications in real-world, unpredictable environments.

Feedback Control for Cassie with Deep Reinforcement Learning

The development of robust control strategies for bipedal robots, like the Cassie model, remains a significant challenge in robotics, primarily due to the instability and underactuation inherent in human-like locomotion tasks. Historically, control approaches have often leveraged model-based techniques, using local linearization of dynamics coupled with reduced-order abstractions, which inevitably limit the exploitation of the robot's full dynamic capabilities. These methods usually simplify computations for control design, excluding nonlinear aspects like torque limits and joint constraints, which can result in suboptimal performance.

In contrast, the paper under discussion presents a model-free approach, employing Deep Reinforcement Learning (DRL) to develop walking controllers for a bipedal robot. This methodology sidesteps the limitations of traditional model-based strategies by using DRL to harness the dynamic complexities of the Cassie biped. The key innovation here is in formulating the control problem as a Markov Decision Process, which allows the learning of optimal policies mimicking reference motions.

Methodology and Results

The researchers use DRL to train controllers to imitate two-step reference motions of Cassie, achieving robustness through tasks like blind walking on uneven terrains, coping with sensory delays, and countering unexpected perturbations. This approach circumvents the need for model simplifications and the establishment of nominal trajectories, allowing for a richer exploration of the robot's capabilities.

The training process leverages actor-critic algorithms, notably the Proximal Policy Optimization, optimizing a multi-layer neural network-parametrized policy that considers joint angles and a variety of state feedback aspects. Notably, the learned policy demonstrates high resilience to various disturbances. For instance, while a heuristic-based controller succumbs to perturbations and terrain challenges such as 0.07m high sinusoidal terrains, the DRL-based policy can manage up to 0.15m, indicating superior adaptability.

Additionally, the paper explores augmenting control via policy interpolation, allowing the robot to modulate velocity dynamically and tackle more challenging terrains with a degree of success. This adaptability underscores the flexibility and robustness that the DRL framework potentially offers over more traditional gait library approaches.

Implications and Future Directions

From a practical perspective, this research could have substantial implications for advancing the deployment of bipedal robots in real-world environments, where unpredictability and variability demand robust, adaptive control solutions. The success of this approach on the simulated Cassie model suggests potential for broader applications across different robot types and scenarios, although careful consideration of real-world dynamics and noise is necessary to mitigate simulation-to-reality disparity.

Theoretically, these developments contribute significantly to reinforcement learning's applicability in high-dimensional, control-heavy tasks, demonstrating that DRL can extend beyond traditional benchmarks into complex robotic applications. Future research might effectively focus on real-world deployment, adaptation to measured sensory inputs rather than full state feedback, and integration of visual terrain analysis to enhance controller performance further.

Overall, the paper illustrates the promise of DRL in robotic applications, which could redefine autonomous robotic navigation and interaction across unpredictable settings, although practical deployment across varied environments remains a significant challenge to address.

Related Papers

YouTube

Show All Videos