End-to-end Reinforcement Learning for Time-Optimal Quadcopter Flight (2311.16948v1)

Published 28 Nov 2023 in cs.RO

Abstract: Aggressive time-optimal control of quadcopters poses a significant challenge in the field of robotics. The state-of-the-art approach leverages reinforcement learning (RL) to train optimal neural policies. However, a critical hurdle is the sim-to-real gap, often addressed by employing a robust inner loop controller -an abstraction that, in theory, constrains the optimality of the trained controller, necessitating margins to counter potential disturbances. In contrast, our novel approach introduces high-speed quadcopter control using end-to-end RL (E2E) that gives direct motor commands. To bridge the reality gap, we incorporate a learned residual model and an adaptive method that can compensate for modeling errors in thrust and moments. We compare our E2E approach against a state-of-the-art network that commands thrust and body rates to an INDI inner loop controller, both in simulated and real-world flight. E2E showcases a significant 1.39-second advantage in simulation and a 0.17-second edge in real-world testing, highlighting end-to-end reinforcement learning's potential. The performance drop observed from simulation to reality shows potential for further improvement, including refining strategies to address the reality gap or exploring offline reinforcement learning with real flight data.

Citations (5)

View on Semantic Scholar

Summary

The paper introduces a novel end-to-end reinforcement learning system that outputs direct motor commands, achieving up to 1.39-second faster lap times in simulation.
It employs a learned residual model and adaptive techniques to mitigate sim-to-real gaps, validated using a Parrot Bebop 1 in controlled experiments.
The study highlights the potential of neural network controllers for time-optimal flight and suggests further refinements using real-world data.

Introduction

The field of robotics has made great strides with autonomous quadcopters, which are essential in various applications requiring rapid and agile flight. Achieving time-optimal control for these quadcopters often encounters challenges such as the sim-to-real gap, which refers to the difficulty in translating learned behaviors from simulations to real-world scenarios.

Methodology

Researchers have developed an end-to-end reinforcement learning (E2E RL) system, offering direct motor commands without relying on low-level controllers. This approach includes a learned residual model and an adaptive method to adjust for modeling inaccuracies, aiming to bridge the reality gap.

The methodology consists of adopting a quadcopter model that incorporates the E2E network and training strategies, as well as an INDI network that calculates thrust and body rate commands. Each has distinct neural network architectures and inputs specific to their operation within a Markov Decision Process (MDP) frame.

Experimental Setup

The practical application of the E2E and INDI networks was tested using a Parrot Bebop 1 quadcopter within a controlled environment. This quadcopter was chosen for its unique flexible frame, presenting a non-trivial scenario for the networks to operate within. The setup used real-time computation aboard the Bebop's processor and an OptiTrack system to provide precise motion data.

Results & Discussion

The E2E approach showed significant promise. In simulations, the E2E network demonstrated a 1.39-second faster completion time over the state-of-the-art approach, with a 0.17-second lead in real-world tests. This advantage was predominantly visible during the initial lap, starting from a hover. The networks' performance converged in the following laps. While both techniques proved robust in simulations, real-world flights revealed more pronounced gaps, especially for the E2E framework, suggesting its greater sensitivity to modeling errors.

The E2E network's direct handling of motor commands and real-time adjustments presents an exciting avenue for further research to improve quadcopter performance. Refining the E2E network through offline reinforcement learning, using real-world flight data and considering other model discrepancies like battery voltage or maximum RPM variance, could lead to further performance enhancements.

PDF Markdown

Related Papers

YouTube

Show All Videos