Human-compatible driving partners through data-regularized self-play reinforcement learning (2403.19648v2)
Abstract: A central challenge for autonomous vehicles is coordinating with humans. Therefore, incorporating realistic human agents is essential for scalable training and evaluation of autonomous driving systems in simulation. Simulation agents are typically developed by imitating large-scale, high-quality datasets of human driving. However, pure imitation learning agents empirically have high collision rates when executed in a multi-agent closed-loop setting. To build agents that are realistic and effective in closed-loop settings, we propose Human-Regularized PPO (HR-PPO), a multi-agent algorithm where agents are trained through self-play with a small penalty for deviating from a human reference policy. In contrast to prior work, our approach is RL-first and only uses 30 minutes of imperfect human demonstrations. We evaluate agents in a large set of multi-agent traffic scenes. Results show our HR-PPO agents are highly effective in achieving goals, with a success rate of 93%, an off-road rate of 3.5%, and a collision rate of 3%. At the same time, the agents drive in a human-like manner, as measured by their similarity to existing human driving logs. We also find that HR-PPO agents show considerable improvements on proxy measures for coordination with human driving, particularly in highly interactive scenarios. We open-source our code and trained agents at https://github.com/Emerge-Lab/nocturne_lab and provide demonstrations of agent behaviors at https://sites.google.com/view/driving-partners.
- No-press diplomacy from scratch. Advances in Neural Information Processing Systems, 34:18063–18074, 2021.
- Mastering the game of no-press diplomacy via human-regularized reinforcement learning and planning. arXiv preprint arXiv:2210.05492, 2022.
- End-to-end differentiable adversarial imitation learning. In International Conference on Machine Learning, pp. 390–399. PMLR, 2017.
- The hanabi challenge: A new frontier for ai research. Artificial Intelligence, 280:103216, 2020.
- End to end learning for self-driving cars. arXiv preprint arXiv:1604.07316, 2016.
- nuplan: A closed-loop ml-based planning benchmark for autonomous vehicles. arXiv preprint arXiv:2106.11810, 2021.
- Traffic simulation with aimsun. Fundamentals of traffic simulation, pp. 173–232, 2010.
- Diffusion policy: Visuomotor policy learning via action diffusion. arXiv preprint arXiv:2303.04137, 2023.
- CARLA: An open urban driving simulator. In Proceedings of the 1st Annual Conference on Robot Learning, pp. 1–16, 2017.
- Large scale interactive motion forecasting for autonomous driving: The waymo open motion dataset. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9710–9719, 2021.
- Human-level play in the game of diplomacy by combining language models with strategic reasoning. Science, 378(6624):1067–1074, 2022.
- Learning robust rewards with adversarial inverse reinforcement learning. arXiv preprint arXiv:1710.11248, 2017.
- imitation: Clean imitation learning implementations. arXiv:2211.11972v1 [cs.LG], 2022. URL https://arxiv.org/abs/2211.11972.
- Waymax: An accelerated, data-driven simulator for large-scale autonomous driving research. arXiv preprint arXiv:2310.08710, 2023.
- Generative adversarial imitation learning. Advances in neural information processing systems, 29, 2016.
- “other-play” for zero-shot coordination. In International Conference on Machine Learning, pp. 4399–4410. PMLR, 2020.
- Off-belief learning. In International Conference on Machine Learning, pp. 4369–4379. PMLR, 2021.
- Human-ai coordination via human-regularized search and learning. arXiv preprint arXiv:2210.05125, 2022.
- Symphony: Learning realistic and diverse agents for autonomous driving simulation. In 2022 International Conference on Robotics and Automation (ICRA), pp. 2445–2451. IEEE, 2022.
- Modeling strong and human-like gameplay with kl-regularized search. In International Conference on Machine Learning, pp. 9695–9728. PMLR, 2022.
- General lane-changing model mobil for car-following models. Transportation Research Record, 1999(1):86–94, 2007.
- Reward (mis) design for autonomous driving. Artificial Intelligence, 316:103829, 2023.
- Analysis of the generalized intelligent driver model (gidm) for uncontrolled intersections. In 2021 IEEE International Intelligent Transportation Systems Conference (ITSC), pp. 3223–3230. IEEE, 2021.
- Learning existing social conventions via observationally augmented self-play. In Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society, pp. 107–114, 2019.
- Metadrive: Composing diverse driving scenarios for generalizable reinforcement learning. IEEE transactions on pattern analysis and machine intelligence, 45(3):3461–3475, 2022.
- Cirl: Controllable imitative reinforcement learning for vision-based self-driving. In Proceedings of the European conference on computer vision (ECCV), pp. 584–599, 2018.
- Microscopic traffic simulation using sumo. In The 21st IEEE International Conference on Intelligent Transportation Systems. IEEE, 2018. URL https://elib.dlr.de/124092/.
- Imitation is not enough: Robustifying imitation with reinforcement learning for challenging driving scenarios. In 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 7553–7560. IEEE, 2023.
- The waymo open sim agents challenge. Advances in Neural Information Processing Systems, 36, 2024.
- Deep learning for safe autonomous driving: Current challenges and future directions. IEEE Transactions on Intelligent Transportation Systems, 22(7):4316–4336, 2020.
- Wayformer: Motion forecasting via simple & efficient attention networks. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pp. 2980–2987. IEEE, 2023.
- Algorithms for inverse reinforcement learning. In Icml, volume 1, pp. 2, 2000.
- Virtual to real reinforcement learning for autonomous driving. arXiv preprint arXiv:1704.03952, 2017.
- Trajeglish: Learning the language of driving scenarios. arXiv preprint arXiv:2312.04535, 2023.
- Dean A Pomerleau. Alvinn: An autonomous land vehicle in a neural network. Advances in neural information processing systems, 1, 1988.
- Stable-baselines3: Reliable reinforcement learning implementations. Journal of Machine Learning Research, 22(268):1–8, 2021. URL http://jmlr.org/papers/v22/20-1364.html.
- A reduction of imitation learning and structured prediction to no-regret online learning. In Proceedings of the fourteenth international conference on artificial intelligence and statistics, pp. 627–635. JMLR Workshop and Conference Proceedings, 2011.
- Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
- Claude Elwood Shannon. A mathematical theory of communication. The Bell system technical journal, 27(3):379–423, 1948.
- Mastering the game of go with deep neural networks and tree search. nature, 529(7587):484–489, 2016.
- A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science, 362(6419):1140–1144, 2018.
- Collaborating with humans without human data. Advances in Neural Information Processing Systems, 34:14502–14515, 2021.
- Trafficsim: Learning to simulate realistic multi-agent behaviors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10400–10409, 2021.
- Language conditioned traffic generation. arXiv preprint arXiv:2307.07947, 2023.
- Congested traffic states in empirical observations and microscopic simulations. Physical review E, 62(2):1805, 2000.
- Nocturne: a scalable driving benchmark for bringing multi-agent learning one step closer to the real world. Advances in Neural Information Processing Systems, 35:3962–3974, 2022.
- Grandmaster level in starcraft ii using multi-agent reinforcement learning. Nature, 575(7782):350–354, 2019.
- Bits: Bi-level imitation for traffic simulation. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pp. 2929–2936. IEEE, 2023.
- Learning realistic traffic agents in closed-loop. arXiv preprint arXiv:2311.01394, 2023.