Decentralized Control of Quadrotor Swarms with End-to-end Deep Reinforcement Learning

Published 16 Sep 2021 in cs.RO | (2109.07735v2)

Abstract: We demonstrate the possibility of learning drone swarm controllers that are zero-shot transferable to real quadrotors via large-scale multi-agent end-to-end reinforcement learning. We train policies parameterized by neural networks that are capable of controlling individual drones in a swarm in a fully decentralized manner. Our policies, trained in simulated environments with realistic quadrotor physics, demonstrate advanced flocking behaviors, perform aggressive maneuvers in tight formations while avoiding collisions with each other, break and re-establish formations to avoid collisions with moving obstacles, and efficiently coordinate in pursuit-evasion tasks. We analyze, in simulation, how different model architectures and parameters of the training regime influence the final performance of neural swarms. We demonstrate the successful deployment of the model learned in simulation to highly resource-constrained physical quadrotors performing station keeping and goal swapping behaviors. Code and video demonstrations are available on the project website at https://sites.google.com/view/swarm-rl.

Abstract PDF Upgrade to Chat

Citations (43)

View on Semantic Scholar

Summary

The paper introduces an end-to-end deep reinforcement learning approach for decentralized control of quadrotor swarms, achieving zero-shot sim-to-real transfer.
It leverages neural architectures like deep sets and attention mechanisms to process local observations, with attention networks excelling in dynamic collision avoidance.
Experimental results demonstrate scalability and real-world viability, with policies effective on up to 128 simulated units and successful deployment on Crazyflie2.0 platforms.

Decentralized Control of Quadrotor Swarms with End-to-end Deep Reinforcement Learning

The paper "Decentralized Control of Quadrotor Swarms with End-to-end Deep Reinforcement Learning" presents a methodology for developing control policies for quadrotor swarms using multi-agent deep reinforcement learning (DRL), which are trained in a simulated environment and transferable in a zero-shot manner to real-world quadrotors. The authors explore the use of neural network policies to enable each drone in a swarm to act autonomously based on local observations, removing the need for full-state information or extensive real-time computation, thereby expanding the operability of quadrotor swarms in complex and dynamically changing environments.

Methodology and Approach

The authors formulate the problem by defining a set of quadrotors whose goal is to minimize the distance to a desired position while preventing collisions. They employ end-to-end DRL, which means that the control architecture learns all the way from raw sensor inputs to low-level motor commands, facilitated by a large-scale training regime involving hundreds of millions of environment transitions in a detailed physics simulator. The simulation is designed to closely mimic real-world conditions, including non-ideal motor behavior and noisy sensor readings, to improve the success of sim-to-real transfer.

Two main neural architectures for processing local observations are investigated: deep sets and attention mechanisms. Both aim to compute an effective neighborhood representation critical for effectively navigating and avoiding collisions. The deep sets architecture offers permutation invariance and scalability, while attention mechanisms provide a means to prioritize dynamic neighbors, enhancing collision avoidance capabilities. Evaluation reveals a superior performance of attention-based networks, especially in scenarios requiring dense swarm formations and dynamic interactions like evader pursuit.

Results

The trained policies demonstrate an aptitude for sophisticated behaviors, including aggressive maneuvering, formation swapping, and dynamic obstacle avoidance. These behaviors are verified in different scenarios within the simulation, such as static and dynamic formation maintenance, swarm-vs-swarm goal swapping, and evader pursuit. Notable achievements include low collision rates and the capability of adjusting to environmental dynamics without pre-programmed motion plans.

Furthermore, when scaled up to control large swarms with minimal retraining, the learned policies maintain performance integrity. This is evaluated with up to 128 quadrotors, indicating the models' scalability and robustness, even though larger swarms exhibited higher collision rates primarily due to the cascading effect of single collisions affecting nearby agents.

Real-world Deployment

The authors extend their simulations to physical quadrotors, deploying the learned policies on the Crazyflie2.0 platform. Despite constraints of onboard computation and communication, the policies successfully manage up to eight quadrotors performing coordinated tasks in shared airspace, retaining strong collision avoidance capabilities. Through leveraging a reduced neural model architecture, the drones executed tasks in real-world conditions at high frequency, underscoring the real-time operability of the approach.

Implications and Future Work

The paper reflects a significant step towards robust deployment of drone swarms in uncertain environments, remote from high-capacity computation resources or exhaustive pre-planned trajectories. There are broader implications for the applicability of such frameworks in fields demanding autonomous operation, such as search and rescue, environmental monitoring, and logistics.

Future work could emphasize enhancing the scalability of DRL policies by integrating Graph Neural Networks (GNNs) to enable distributed decision-making with broader shared state awareness while maintaining decentralized execution. Such directions could also explore refining hierarchical models that dynamically adjust the control complexity based on the evaluated task difficulty or environment constraints.

In summary, this work highlights how end-to-end DRL can surpass traditional methodologies' limitations and propel multi-robot systems towards efficient, effective, and autonomous deployment in real-world scenarios, all while presenting potential scalability to extensive swarm sizes.

Markdown Report Issue