DiLu: A Knowledge-Driven Approach to Autonomous Driving with Large Language Models (2309.16292v3)
Abstract: Recent advancements in autonomous driving have relied on data-driven approaches, which are widely adopted but face challenges including dataset bias, overfitting, and uninterpretability. Drawing inspiration from the knowledge-driven nature of human driving, we explore the question of how to instill similar capabilities into autonomous driving systems and summarize a paradigm that integrates an interactive environment, a driver agent, as well as a memory component to address this question. Leveraging LLMs with emergent abilities, we propose the DiLu framework, which combines a Reasoning and a Reflection module to enable the system to perform decision-making based on common-sense knowledge and evolve continuously. Extensive experiments prove DiLu's capability to accumulate experience and demonstrate a significant advantage in generalization ability over reinforcement learning-based methods. Moreover, DiLu is able to directly acquire experiences from real-world datasets which highlights its potential to be deployed on practical autonomous driving systems. To the best of our knowledge, we are the first to leverage knowledge-driven capability in decision-making for autonomous vehicles. Through the proposed DiLu framework, LLM is strengthened to apply knowledge and to reason causally in the autonomous driving domain. Project page: https://pjlab-adg.github.io/DiLu/
- Description of corner cases in automated driving: Goals and challenges. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshop, pp. 1023–1028, 2021.
 - Towards corner case detection for autonomous driving. In 2019 IEEE Intelligent vehicles symposium (IV), pp. 438–445. IEEE, 2019.
 - Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
 - Sparks of artificial general intelligence: Early experiments with gpt-4. arXiv preprint arXiv:2303.12712, 2023.
 - Milestones in autonomous driving and intelligent vehicles: Survey of surveys. IEEE Transactions on Intelligent Vehicles, 8(2):1046–1056, 2022.
 - Milestones in autonomous driving and intelligent vehicles—part 1: Control, computing system design, communication, hd map, testing, and human behaviors. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2023a.
 - Milestones in autonomous driving and intelligent vehicles—part ii: Perception and planning. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2023b.
 - Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311, 2022.
 - Exploring the limitations of behavior cloning for autonomous driving. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9329–9338, 2019.
 - Palm-e: An embodied multimodal language model. In arXiv preprint arXiv:2303.03378, 2023a.
 - Palm-e: An embodied multimodal language model. arXiv preprint arXiv:2303.03378, 2023b.
 - A survey of embodied ai: From simulators to research tasks. IEEE Transactions on Emerging Topics in Computational Intelligence, 6(2):230–244, 2022.
 - Drive like a human: Rethinking autonomous driving with large language models. arXiv preprint arXiv:2307.07162, 2023.
 - Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:2304.15010, 2023.
 - An application-driven conceptualization of corner cases for perception in highly automated driving. In 2021 IEEE Intelligent Vehicles Symposium (IV), pp. 644–651. IEEE, 2021.
 - Instruct2act: Mapping multi-modality instructions to robotic actions with large language model. arXiv preprint arXiv:2305.11176, 2023a.
 - Voxposer: Composable 3d value maps for robotic manipulation with language models. arXiv preprint arXiv:2307.05973, 2023b.
 - Surrealdriver: Designing generative driver agent simulation framework in urban contexts based on large language model, 2023.
 - Billion-scale similarity search with gpus. IEEE Transactions on Big Data, 7(3):535–547, 2019.
 - Yann LeCun. A path towards autonomous machine intelligence version 0.9. 2, 2022-06-27. Open Review, 62, 2022.
 - Edouard Leurent. An environment for autonomous driving decision-making. https://github.com/eleurent/highway-env, 2018.
 - Demystifying gpt self-repair for code generation, 2023.
 - OpenAI. Introducing chatgpt. https://openai.com/blog/chatgpt/, 2023a.
 - OpenAI. Gpt-4 technical report, 2023b.
 - Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744, 2022.
 - Generative agents: Interactive simulacra of human behavior. arXiv preprint arXiv:2304.03442, 2023.
 - Embodied artificial intelligence: Trends and challenges. In Embodied Artificial Intelligence: International Seminar, Dagstuhl Castle, Germany, July 7-11, 2003. Revised Papers, pp. 1–26. Springer, 2004.
 - Nlx-gpt: A model for natural language explanations in vision and vision-language tasks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8322–8332, 2022.
 - Toolformer: Language models can teach themselves to use tools. arXiv preprint arXiv:2302.04761, 2023.
 - Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
 - Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
 - Voyager: An open-ended embodied agent with large language models. arXiv preprint arXiv:2305.16291, 2023.
 - Wayve. Lingo-1: Exploring natural language for autonomous driving. https://wayve.ai/thinking/lingo-natural-language-autonomous-driving/, 2023.
 - Finetuned language models are zero-shot learners. arXiv preprint arXiv:2109.01652, 2021.
 - Chain-of-thought prompting elicits reasoning in large language models, 2023.
 - A graph representation for autonomous driving. In The 36th Conference on Neural Information Processing Systems Workshop, 2022.
 - React: Synergizing reasoning and acting in language models. arXiv preprint arXiv:2210.03629, 2022.
 - A survey of large language models. arXiv preprint arXiv:2303.18223, 2023.
 - Citysim: A drone-based vehicle trajectory dataset for safety-oriented research and digital twins. Transportation Research Record, 2023. doi: 10.1177/03611981231185768.
 - Corner cases in data-driven automated driving: Definitions, properties and solutions. In 2023 IEEE Intelligent Vehicles Symposium (IV), pp. 1–8. IEEE, 2023.
 - Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:2304.10592, 2023a.
 - Ghost in the minecraft: Generally capable agents for open-world enviroments via large language models with text-based knowledge and memory. arXiv preprint arXiv:2305.17144, 2023b.
 
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.