Super-Human Performance in Gran Turismo Sport Using Deep Reinforcement Learning (2008.07971v2)

Published 18 Aug 2020 in cs.AI, cs.LG, and cs.RO

Abstract: Autonomous car racing is a major challenge in robotics. It raises fundamental problems for classical approaches such as planning minimum-time trajectories under uncertain dynamics and controlling the car at the limits of its handling. Besides, the requirement of minimizing the lap time, which is a sparse objective, and the difficulty of collecting training data from human experts have also hindered researchers from directly applying learning-based approaches to solve the problem. In the present work, we propose a learning-based system for autonomous car racing by leveraging a high-fidelity physical car simulation, a course-progress proxy reward, and deep reinforcement learning. We deploy our system in Gran Turismo Sport, a world-leading car simulator known for its realistic physics simulation of different race cars and tracks, which is even used to recruit human race car drivers. Our trained policy achieves autonomous racing performance that goes beyond what had been achieved so far by the built-in AI, and, at the same time, outperforms the fastest driver in a dataset of over 50,000 human players.

Citations (109)

View on Semantic Scholar

Summary

The paper demonstrates a novel deep reinforcement learning approach using the SAC algorithm that outperforms both built-in AI and human players.
The methodology features a neural network with human-like inputs and a custom reward system designed to minimize lap times under realistic racing conditions.
Experimental results indicate a 0.15-second lap time improvement in high-speed settings, highlighting DRL's potential for autonomous control in complex environments.

Overview of Deep Reinforcement Learning Achieving Super-Human Performance in Gran Turismo Sport

This paper presents an approach to autonomous racing in the Gran Turismo Sport (GTS) simulator using deep reinforcement learning (DRL), demonstrating that it achieves super-human performance. The paper focuses on minimizing lap times on complex tracks, tackling the challenges posed by high-speed racing under realistic conditions modeled by GTS—a platform known for its fidelity in car and track simulation. This research develops a neural network-based control policy capable of outperforming both the built-in AI and human players across multiple racing scenarios.

Methodological Insights

The paper adopts a model-free DRL method using the Soft Actor-Critic (SAC) algorithm. A proxy reward linked to course progression is crafted to effectively address the sparse nature of lap time objectives. This reward formulation, combined with a wall-contact penalty tied to the car's kinetic energy, enables efficient policy learning capable of ensuring rapid and precise vehicle control under extreme conditions. The policy is designed as a multilayer perceptron network, trained to directly map observations to actions without the need for explicit trajectory planning, a key distinction from traditional racing simulation methods.

The input features for this network are carefully chosen to emulate the parameters accessible to human drivers, such as velocity, acceleration, rangefinder measurements, and track curvature estimates. This design choice facilitates fair competition with human players, ensuring that the model does not leverage unobservable data.

Experimental Outcomes

The trained policy is evaluated in three distinct race settings featuring different cars and tracks. In all scenarios, the policy achieves lap times superior to those set by the fastest human drivers among more than 50,000 competitors. The results specifically highlight a 0.15-second improvement in lap time for a fast-paced setting involving an Audi TT Cup, illustrating the policy's capacity for navigating high-stress environments with precision.

Moreover, the trajectory analysis reveals that the DRL policy mimics optimal human strategies such as out-in-out maneuvers and anticipates sharp curves well ahead, ensuring ideal braking points. This establishes its robustness in executing tactically sound and agile paths, revealing the potential of DRL methods in environments traditionally dominated by manual control.

Broader Implications and Future Directions

The implications of this work are significant both theoretically and practically. It suggests that DRL can reliably produce autonomous agents capable of high-performance vehicular control in simulated environments, paving the way for applications in real-world autonomous racing and potentially other domains requiring rapid decision-making under constraints.

Additionally, the successful application in GTS—a computing-limited setting—demonstrates that similar methodologies could be adapted to other real-time systems, addressing limitations of existing trajectory planning approaches. Future research could focus on generalizing the policy to handle multiple track/car combinations or incorporating multi-agent dynamics to simulate races with competing vehicles, further enhancing the realism and applicability of DRL in complex systems.

Thus, this paper contributes a clear pathway for the deployment of DRL in autonomous control tasks, yielding insights into its capabilities and prompting further exploration of its application in high-speed vehicular environments and other domains demanding intricate maneuvering competencies.

PDF Markdown

Related Papers

YouTube

Show All Videos