Multi-Agent Deep Reinforcement Learning for Cooperative and Competitive Autonomous Vehicles using AutoDRIVE Ecosystem (2309.10007v2)

Published 18 Sep 2023 in cs.RO, cs.AI, cs.LG, and cs.MA

Abstract: This work presents a modular and parallelizable multi-agent deep reinforcement learning framework for imbibing cooperative as well as competitive behaviors within autonomous vehicles. We introduce AutoDRIVE Ecosystem as an enabler to develop physically accurate and graphically realistic digital twins of Nigel and F1TENTH, two scaled autonomous vehicle platforms with unique qualities and capabilities, and leverage this ecosystem to train and deploy multi-agent reinforcement learning policies. We first investigate an intersection traversal problem using a set of cooperative vehicles (Nigel) that share limited state information with each other in single as well as multi-agent learning settings using a common policy approach. We then investigate an adversarial head-to-head autonomous racing problem using a different set of vehicles (F1TENTH) in a multi-agent learning setting using an individual policy approach. In either set of experiments, a decentralized learning architecture was adopted, which allowed robust training and testing of the approaches in stochastic environments, since the agents were mutually independent and exhibited asynchronous motion behavior. The problems were further aggravated by providing the agents with sparse observation spaces and requiring them to sample control commands that implicitly satisfied the imposed kinodynamic as well as safety constraints. The experimental results for both problem statements are reported in terms of quantitative metrics and qualitative remarks for training as well as deployment phases.

Summary

The paper introduces a modular MARL framework that trains digital twins for autonomous vehicles in both cooperative intersection maneuvers and competitive racing.
It details realistic vehicle dynamics, sensor and actuator models, and V2V communication to ensure high simulation fidelity and robust policy deployment.
The research leverages PPO with a hybrid imitation-reinforcement strategy, enabling efficient training for complex multi-agent environments and strategic autonomous behaviors.

Multi-Agent Deep Reinforcement Learning for Cooperative and Competitive Autonomous Vehicles using AutoDRIVE Ecosystem

This essay provides a detailed analysis and summary of a paper that presents a modular and parallelizable multi-agent deep reinforcement learning (MARL) framework tailored for developing cooperative and competitive behaviors in autonomous vehicles. The research employs the AutoDRIVE Ecosystem to create digital twins of two autonomous vehicle platforms, Nigel and F1TENTH, each with distinctive characteristics.

Digital Twin Creation Using AutoDRIVE Ecosystem

The authors leveraged the AutoDRIVE Simulator to develop highly accurate digital twin models. These digital models serve as instrumental tools in training and deploying MARL policies, simulating both emergent cooperative and competitive traits.

Vehicle Dynamics Models: The paper outlines the thorough modeling of vehicle dynamics involving suspension forces, tire forces, and the influence of various parameters such as tire slip, steering angle, and wheel speed. The dynamics are analytically detailed with respect to both rigid and sprung masses, as well as tire interaction forces expressed via a two-piece cubic spline friction curve.

Sensor Models: In simulated environments, vehicles were equipped with various sensors, including incremental encoders, IMU, IPS, LIDAR, and cameras. These sensor models simulated realistic data acquisition essential for observing environmental states and making informed decisions.

Actuator Models: Simulations included models for both driving and steering actuators with response delays and saturation limits, mirroring their real-world counterparts to enhance the fidelity of the simulations.

Environment Models: The simulator supported custom scenarios through modular kits like the AutoDRIVE IDK, third-party integrations such as RoadRunner, and Unity Terrain Integration for both on-road and off-road scenarios.

Cooperative Multi-Agent Scenario

The cooperative scenario involved intersection traversal with both single-agent and multi-agent settings. Agents communicated with each other via Vehicle-to-Vehicle (V2V) communication to share limited state information.

Problem Formulation and Learning Framework

In the single-agent setting, the ego vehicle learned to traverse the intersection while other vehicles followed predefined paths. For the multi-agent setting, all vehicles learned concurrently in a decentralized manner, increasing the complexity and stochasticity of the environment.

State Space and Observation Space: Agents operated in a POMDP framework capturing partial environmental observations. Observations included the agent's position, velocity, and goal coordinates. Vehicles communicated their states via V2V, making decisions based on both their state and the observed states of peers.

Action Space and Reward Function: The action space was defined in terms of discrete steering commands, while the reward function combined penalties for collisions and boundary violations with rewards for successful intersection traversal.

Training and Deployment: Training utilized a fully connected neural network (FCNN) optimized with the PPO algorithm. Results showed that while single-agent scenarios were fairly deterministic, the multi-agent scenarios presented significant challenges due to their inherent stochastic nature.

Competitive Multi-Agent Scenario

The competitive setting simulates autonomous racing where agents aimed to minimize lap times on a track without colliding with barriers or opponents.

Problem Formulation and Learning Framework

Hybrid Learning Approach: The paper employed a hybrid imitation-reinforcement learning architecture with initial demonstration datasets guiding agent learning, significantly reducing training times.

Observation Space and Action Space: Observations included forward velocity and sparse LIDAR measurements, summing to a vector fed into the policy network. Actions were discretized into throttle and steering commands.

Reward Function: Agents were rewarded for passing checkpoints, completing laps, and achieving new lap records, while penalties were imposed for collisions. The reward structure incentivized velocity optimization and strategic maneuvering.

Training and Deployment: The training framework, like the cooperative scenario, used an FCNN and the PPO algorithm. During deployment, trained policies displayed strategic behaviors such as blocking, overtaking, and competitive racing.

Implications and Future Directions

The paper demonstrates that reinforcing learning in autonomous driving environments can effectively instill both cooperative and competitive behaviors. The use of the AutoDRIVE Ecosystem for simulating and deploying these behaviors shows promise for further research and practical applications in real-world autonomous vehicle systems. The discussed cases represent only initial explorations, with future work likely focusing on sim2real transfer, where policies trained in simulation are deployed in real-world vehicles while retaining their learned behaviors and efficiencies.

In summation, this research provides substantial insights into the potential and challenges of multi-agent reinforcement learning for autonomous vehicle applications. The implications of this work suggest significant advancements in the domains of intelligent transportation systems, cooperative traffic management, and competitive autonomous racing, paving the way for more adaptive and robust autonomous driving technologies.

PDF Markdown

Related Papers

YouTube

Show All Videos