NAVSIM: Data-Driven Non-Reactive Autonomous Vehicle Simulation and Benchmarking (2406.15349v2)

Published 21 Jun 2024 in cs.CV, cs.AI, cs.LG, and cs.RO

Abstract: Benchmarking vision-based driving policies is challenging. On one hand, open-loop evaluation with real data is easy, but these results do not reflect closed-loop performance. On the other, closed-loop evaluation is possible in simulation, but is hard to scale due to its significant computational demands. Further, the simulators available today exhibit a large domain gap to real data. This has resulted in an inability to draw clear conclusions from the rapidly growing body of research on end-to-end autonomous driving. In this paper, we present NAVSIM, a middle ground between these evaluation paradigms, where we use large datasets in combination with a non-reactive simulator to enable large-scale real-world benchmarking. Specifically, we gather simulation-based metrics, such as progress and time to collision, by unrolling bird's eye view abstractions of the test scenes for a short simulation horizon. Our simulation is non-reactive, i.e., the evaluated policy and environment do not influence each other. As we demonstrate empirically, this decoupling allows open-loop metric computation while being better aligned with closed-loop evaluations than traditional displacement errors. NAVSIM enabled a new competition held at CVPR 2024, where 143 teams submitted 463 entries, resulting in several new insights. On a large set of challenging scenarios, we observe that simple methods with moderate compute requirements such as TransFuser can match recent large-scale end-to-end driving architectures such as UniAD. Our modular framework can potentially be extended with new datasets, data curation strategies, and metrics, and will be continually maintained to host future challenges. Our code is available at https://github.com/autonomousvision/navsim.

Citations (19)

View on Semantic Scholar

Summary

The paper introduces a non-reactive simulation framework that computes metrics like progress, time to collision, and comfort using a static environment post initial actions.
It employs challenging scenario sampling from over 100,000 real-world cases to reveal the limitations of simpler autonomous driving policies.
The study provides comprehensive benchmarking with standardized splits and an open evaluation server, enabling robust performance comparisons for AV policies.

NAVSIM: Data-Driven Non-Reactive Autonomous Vehicle Simulation and Benchmarking

The paper presents NAVSIM, a novel framework for the simulation and benchmarking of vision-based driving policies in autonomous vehicles (AVs). The motivation behind NAVSIM is the need for a balanced evaluation framework that addresses the limitations of both open-loop and closed-loop evaluation methods. Traditional open-loop benchmarks are scalable but fail to capture the dynamic interactions in driving scenarios. Conversely, closed-loop evaluations offer interactive feedback but are computationally intensive and suffer from domain gaps with real-world data.

NAVSIM introduces a middle-ground solution by employing a non-reactive simulation approach. This method leverages large real-world datasets combined with a non-reactive simulation protocol, where the AV's environment remains static over a short simulation horizon following an initial set of actions by the tested policy. The critical advantage of this technique is its ability to scale while providing more meaningful metrics for evaluating driving policies as compared to traditional displacement errors.

Key Contributions

Non-Reactive Simulation Framework: NAVSIM features a non-reactive simulation system where the AV policy does not influence its environment post the initial action set. Metrics such as progress, time to collision (TTC), and comfort are computed under this static assumption, simplifying the simulation while maintaining stronger alignment with closed-loop assessments than traditional metrics like average displacement error (ADE).
Challenging Scenario Sampling: The authors introduce a method for sampling challenging driving scenarios from the largest publicly available driving dataset, resulting in over 100,000 real-world scenarios. In these scenarios, simpler "blind" driving policies fail to perform effectively, highlighting the necessity for principled sensor-based policies.
Comprehensive Benchmarking: NAVSIM includes standardized training and evaluation splits and an open evaluation server on the HuggingFace platform. This setup was utilized for a competition at CVPR 2024, where 143 teams made 463 submissions, revealing several new insights into the performance of different driving architectures.

Numerical Insights

Empirical evaluation on NAVSIM's challenging scenarios reveals remarkable findings:

TransFuser and UniAD, modern end-to-end driving architectures, perform similarly despite significant differences in computational training requirements. Notably, simpler methods like TransFuser can match or even outperform complex architectures like UniAD when benchmarked in NAVSIM.
The framework discerns subtle differences in driving policy performance that are overlooked by traditional open-loop metrics. For instance, PDMS (Predictive Driver Model Score) captures nuances related to safety (no collisions), adherence to drivable areas, TTC, comfort, and progress efficiency.

Implications and Future Directions

The introduction of NAVSIM has immediate theoretical and practical implications.

Theoretically, it provides a more robust and scalable methodology for evaluating AV driving policies, bridging the gap between open-loop and closed-loop metric needs. This can catalyze future research on interpreting and improving policy decisions in dynamic, multifaceted driving environments.
Practically, NAVSIM can streamline the benchmarking and development process for AV algorithms, enabling the community to focus on designing policies that perform well in realistic conditions without incurring the substantial costs of closed-loop simulation.

Looking ahead, NAVSIM's flexibility allows it to be extended with new datasets and metrics. This modularity ensures that NAVSIM can evolve alongside advancements in AV technology and dataset availability. Future developments might include integrating reactive elements to further narrow the gap between simulation and real-world driving or expanding the range of evaluation scenarios to encompass diverse geographical and environmental conditions.

In summary, NAVSIM's data-driven, non-reactive simulation paradigm represents a significant step forward in autonomous vehicle benchmarking. By addressing the limitations of current evaluation standards, it promises to be an invaluable tool for the development and assessment of next-generation autonomous driving systems.

PDF Markdown

Related Papers

GitHub

GitHub - autonomousvision/navsim: NAVSIM: Data-Driven Non-Reactive Autonomous Vehicle Simulation (274 stars)

Tweets

https://twitter.com/_vztu/status/1805313328883056817

https://twitter.com/DanielDauner/status/1805162871283741026

https://twitter.com/riyanshshah_/status/1919246462057136418

https://twitter.com/ai_arxiv/status/1805068536324280338