Data-efficient Deep Reinforcement Learning for Vehicle Trajectory Control (2311.18393v1)

Published 30 Nov 2023 in cs.LG and cs.RO

Abstract: Advanced vehicle control is a fundamental building block in the development of autonomous driving systems. Reinforcement learning (RL) promises to achieve control performance superior to classical approaches while keeping computational demands low during deployment. However, standard RL approaches like soft-actor critic (SAC) require extensive amounts of training data to be collected and are thus impractical for real-world application. To address this issue, we apply recently developed data-efficient deep RL methods to vehicle trajectory control. Our investigation focuses on three methods, so far unexplored for vehicle control: randomized ensemble double Q-learning (REDQ), probabilistic ensembles with trajectory sampling and model predictive path integral optimizer (PETS-MPPI), and model-based policy optimization (MBPO). We find that in the case of trajectory control, the standard model-based RL formulation used in approaches like PETS-MPPI and MBPO is not suitable. We, therefore, propose a new formulation that splits dynamics prediction and vehicle localization. Our benchmark study on the CARLA simulator reveals that the three identified data-efficient deep RL approaches learn control strategies on a par with or better than SAC, yet reduce the required number of environment interactions by more than one order of magnitude.

Authors (3)

Bernd Frauenknecht (4 papers)
Tobias Ehlgen (1 paper)
Sebastian Trimpe (111 papers)

Citations (1)

View on Semantic Scholar

Summary

The paper presents deep RL methods that reduce training data needs by up to an order of magnitude compared to traditional SAC approaches.
It shows that both REDQ and MBPO, despite their model-free and model-based differences, achieve similar learning speeds and final performance levels.
PETS-MPPI attains stable driving behavior faster than SAC, demonstrating a practical edge for rapid deployment in autonomous vehicle control.

Introduction to Data-Efficient Deep Reinforcement Learning

Deep reinforcement learning (RL) has become a significant player in the world of autonomous systems, particularly in advanced vehicle control necessary for autonomous driving. Unlike common optimization-based control methods or those relying on large data sets for imitation learning, RL offers a pathway to develop control strategies through interaction with the environment. Although powerful, traditional model-free reinforcement learning approaches, such as soft actor-critic (SAC), require extensive training data, making them unsuited for real-world applications. To tackle this issue, research has been directed towards more data-efficient deep reinforcement learning methods suitable for vehicle trajectory control.

Novel Approaches to Vehicle Control

Researchers have deployed three relatively recent data-efficient deep RL methods to vehicle trajectory control:

Randomized Ensemble Double Q-learning (REDQ)
Probabilistic Ensembles with Trajectory Sampling and Model Predictive Path Integral optimizer (PETS-MPPI)
Model-Based Policy Optimization (MBPO)

The standard formulation of model-based RL, typically used in these approaches, demonstrated to be ill-suited for the specifics of trajectory control. In light of this, a novel model-based prediction approach was proposed. Instead of learning a complete state transition model, only vehicle dynamics are learned, and trajectory deviations are computed using prior knowledge. This division simplifies the learning task and enhances the reliability of the model's predictions.

Empirical Insights from Simulation

The evaluation of these RL methods on the CARLA simulator, a realistic urban driving environment, revealed that they offer performance on par with or better than SAC. However, more importantly, they require significantly less interaction data, sometimes by more than an order of magnitude. The key findings include:

PETS-MPPI achieved stable driving behavior quicker than SAC, with comparatively lower final performance.
REDQ and MBPO not only achieved similar final performance levels as SAC but they did so using considerably less data.

Model-Free and Model-Based Dichotomy

Interestingly, both REDQ and MBPO displayed similar learning speeds and asymptotic performance despite their different underlying frameworks — REDQ operates in a model-free setting, while MBPO employs a model-based approach with synthetic data rollouts enhancing the available dataset. SAC, on the other hand, despite eventually reaching a good control policy, needed substantially more training data to match the performance of these data-efficient models.

Concluding Remarks

This paper sheds light on the potential of data-efficient RL in automotive control, significantly reducing the volume of data needed for training without compromising performance. Importantly, these findings can propel the application of RL in areas where data collection is expensive or risky, by proposing a framework where less is more, achieving superior results with minimal data. The work not only moves us closer to the goal of autonomous driving but also opens up new pathways for data-efficient learning in other realms of engineering and technology.

PDF Markdown