Emergent Mind

Abstract

Aggressive time-optimal control of quadcopters poses a significant challenge in the field of robotics. The state-of-the-art approach leverages reinforcement learning (RL) to train optimal neural policies. However, a critical hurdle is the sim-to-real gap, often addressed by employing a robust inner loop controller -an abstraction that, in theory, constrains the optimality of the trained controller, necessitating margins to counter potential disturbances. In contrast, our novel approach introduces high-speed quadcopter control using end-to-end RL (E2E) that gives direct motor commands. To bridge the reality gap, we incorporate a learned residual model and an adaptive method that can compensate for modeling errors in thrust and moments. We compare our E2E approach against a state-of-the-art network that commands thrust and body rates to an INDI inner loop controller, both in simulated and real-world flight. E2E showcases a significant 1.39-second advantage in simulation and a 0.17-second edge in real-world testing, highlighting end-to-end reinforcement learning's potential. The performance drop observed from simulation to reality shows potential for further improvement, including refining strategies to address the reality gap or exploring offline reinforcement learning with real flight data.

Overview

  • Researchers developed an end-to-end reinforcement learning system for time-optimal control of quadcopters, aiming to bridge the sim-to-real gap.

  • The system offers direct motor commands and features a learned residual model with an adaptive method to compensate for modeling inaccuracies.

  • Experiments were conducted using a Parrot Bebop 1 quadcopter, utilizing real-time onboard computation and precise motion capture data.

  • The E2E network outperformed state-of-the-art approaches in both simulations and real-world tests, especially during the initial lap from a hover.

  • Findings suggest future improvements could involve offline reinforcement learning and accounting for further model discrepancies.

Introduction

The field of robotics has made great strides with autonomous quadcopters, which are essential in various applications requiring rapid and agile flight. Achieving time-optimal control for these quadcopters often encounters challenges such as the sim-to-real gap, which refers to the difficulty in translating learned behaviors from simulations to real-world scenarios.

Methodology

Researchers have developed an end-to-end reinforcement learning (E2E RL) system, offering direct motor commands without relying on low-level controllers. This approach includes a learned residual model and an adaptive method to adjust for modeling inaccuracies, aiming to bridge the reality gap.

The methodology consists of adopting a quadcopter model that incorporates the E2E network and training strategies, as well as an INDI network that calculates thrust and body rate commands. Each has distinct neural network architectures and inputs specific to their operation within a Markov Decision Process (MDP) frame.

Experimental Setup

The practical application of the E2E and INDI networks was tested using a Parrot Bebop 1 quadcopter within a controlled environment. This quadcopter was chosen for its unique flexible frame, presenting a non-trivial scenario for the networks to operate within. The setup used real-time computation aboard the Bebop's processor and an OptiTrack system to provide precise motion data.

Results & Discussion

The E2E approach showed significant promise. In simulations, the E2E network demonstrated a 1.39-second faster completion time over the state-of-the-art approach, with a 0.17-second lead in real-world tests. This advantage was predominantly visible during the initial lap, starting from a hover. The networks' performance converged in the following laps. While both techniques proved robust in simulations, real-world flights revealed more pronounced gaps, especially for the E2E framework, suggesting its greater sensitivity to modeling errors.

The E2E network's direct handling of motor commands and real-time adjustments presents an exciting avenue for further research to improve quadcopter performance. Refining the E2E network through offline reinforcement learning, using real-world flight data and considering other model discrepancies like battery voltage or maximum RPM variance, could lead to further performance enhancements.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.