Emergent Mind

Data-efficient Deep Reinforcement Learning for Vehicle Trajectory Control

(2311.18393)
Published Nov 30, 2023 in cs.LG and cs.RO

Abstract

Advanced vehicle control is a fundamental building block in the development of autonomous driving systems. Reinforcement learning (RL) promises to achieve control performance superior to classical approaches while keeping computational demands low during deployment. However, standard RL approaches like soft-actor critic (SAC) require extensive amounts of training data to be collected and are thus impractical for real-world application. To address this issue, we apply recently developed data-efficient deep RL methods to vehicle trajectory control. Our investigation focuses on three methods, so far unexplored for vehicle control: randomized ensemble double Q-learning (REDQ), probabilistic ensembles with trajectory sampling and model predictive path integral optimizer (PETS-MPPI), and model-based policy optimization (MBPO). We find that in the case of trajectory control, the standard model-based RL formulation used in approaches like PETS-MPPI and MBPO is not suitable. We, therefore, propose a new formulation that splits dynamics prediction and vehicle localization. Our benchmark study on the CARLA simulator reveals that the three identified data-efficient deep RL approaches learn control strategies on a par with or better than SAC, yet reduce the required number of environment interactions by more than one order of magnitude.

Overview

  • Deep reinforcement learning (RL) is showing promise for autonomous vehicle control, but traditional methods require substantial data.

  • This paper explores three data-efficient deep RL methods: REDQ, PETS-MPPI, and MBPO for vehicle trajectory control.

  • A novel approach dividing model-based prediction into learning vehicle dynamics and using trajectory deviations enhanced reliability.

  • Simulation tests on CARLA showed these methods outperformed or matched SAC with significantly less data required.

  • The study highlights data-efficient RL's potential in reducing training data requirements without sacrificing performance.

Introduction to Data-Efficient Deep Reinforcement Learning

Deep reinforcement learning (RL) has become a significant player in the world of autonomous systems, particularly in advanced vehicle control necessary for autonomous driving. Unlike common optimization-based control methods or those relying on large data sets for imitation learning, RL offers a pathway to develop control strategies through interaction with the environment. Although powerful, traditional model-free reinforcement learning approaches, such as soft actor-critic (SAC), require extensive training data, making them unsuited for real-world applications. To tackle this issue, research has been directed towards more data-efficient deep reinforcement learning methods suitable for vehicle trajectory control.

Novel Approaches to Vehicle Control

Researchers have deployed three relatively recent data-efficient deep RL methods to vehicle trajectory control:

  1. Randomized Ensemble Double Q-learning (REDQ)
  2. Probabilistic Ensembles with Trajectory Sampling and Model Predictive Path Integral optimizer (PETS-MPPI)
  3. Model-Based Policy Optimization (MBPO)

The standard formulation of model-based RL, typically used in these approaches, demonstrated to be ill-suited for the specifics of trajectory control. In light of this, a novel model-based prediction approach was proposed. Instead of learning a complete state transition model, only vehicle dynamics are learned, and trajectory deviations are computed using prior knowledge. This division simplifies the learning task and enhances the reliability of the model's predictions.

Empirical Insights from Simulation

The evaluation of these RL methods on the CARLA simulator, a realistic urban driving environment, revealed that they offer performance on par with or better than SAC. However, more importantly, they require significantly less interaction data, sometimes by more than an order of magnitude. The key findings include:

  • PETS-MPPI achieved stable driving behavior quicker than SAC, with comparatively lower final performance.
  • REDQ and MBPO not only achieved similar final performance levels as SAC but they did so using considerably less data.

Model-Free and Model-Based Dichotomy

Interestingly, both REDQ and MBPO displayed similar learning speeds and asymptotic performance despite their different underlying frameworks — REDQ operates in a model-free setting, while MBPO employs a model-based approach with synthetic data rollouts enhancing the available dataset. SAC, on the other hand, despite eventually reaching a good control policy, needed substantially more training data to match the performance of these data-efficient models.

Concluding Remarks

This study sheds light on the potential of data-efficient RL in automotive control, significantly reducing the volume of data needed for training without compromising performance. Importantly, these findings can propel the application of RL in areas where data collection is expensive or risky, by proposing a framework where less is more, achieving superior results with minimal data. The work not only moves us closer to the goal of autonomous driving but also opens up new pathways for data-efficient learning in other realms of engineering and technology.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.