End-to-end Autonomous Driving: Challenges and Frontiers (2306.16927v3)
Abstract: The autonomous driving community has witnessed a rapid growth in approaches that embrace an end-to-end algorithm framework, utilizing raw sensor input to generate vehicle motion plans, instead of concentrating on individual tasks such as detection and motion prediction. End-to-end systems, in comparison to modular pipelines, benefit from joint feature optimization for perception and planning. This field has flourished due to the availability of large-scale datasets, closed-loop evaluation, and the increasing need for autonomous driving algorithms to perform effectively in challenging scenarios. In this survey, we provide a comprehensive analysis of more than 270 papers, covering the motivation, roadmap, methodology, challenges, and future trends in end-to-end autonomous driving. We delve into several critical challenges, including multi-modality, interpretability, causal confusion, robustness, and world models, amongst others. Additionally, we discuss current advancements in foundation models and visual pre-training, as well as how to incorporate these techniques within the end-to-end driving framework. we maintain an active repository that contains up-to-date literature and open-source projects at https://github.com/OpenDriveLab/End-to-end-Autonomous-Driving.
- S. Casas, A. Sadat, and R. Urtasun, “Mp3: A unified model to map, perceive, predict and plan,” in CVPR, 2021.
- Y. Hu, J. Yang, L. Chen, K. Li, C. Sima, X. Zhu, S. Chai, S. Du, T. Lin, W. Wang, L. Lu, X. Jia, Q. Liu, J. Dai, Y. Qiao, and H. Li, “Planning-oriented autonomous driving,” in CVPR, 2023.
- D. A. Pomerleau, “Alvinn: An autonomous land vehicle in a neural network,” in NeurIPS, 1988.
- A. Kendall, J. Hawke, D. Janz, P. Mazur, D. Reda, J.-M. Allen, V.-D. Lam, A. Bewley, and A. Shah, “Learning to drive in a day,” in ICRA, 2019.
- D. Chen, B. Zhou, V. Koltun, and P. Krähenbühl, “Learning by cheating,” in CoRL, 2020.
- A. Prakash, K. Chitta, and A. Geiger, “Multi-modal fusion transformer for end-to-end autonomous driving,” in CVPR, 2021.
- N. Hanselmann, K. Renz, K. Chitta, A. Bhattacharyya, and A. Geiger, “King: Generating safety-critical driving scenarios for robust imitation via kinematics gradients,” in ECCV, 2022.
- M. Bojarski, D. Del Testa, D. Dworakowski, B. Firner, B. Flepp, P. Goyal, L. D. Jackel, M. Monfort, U. Muller, J. Zhang, et al., “End to end learning for self-driving cars,” arXiv.org, vol. 1604.07316, 2016.
- F. Codevilla, M. Müller, A. López, V. Koltun, and A. Dosovitskiy, “End-to-end driving via conditional imitation learning,” in ICRA, 2018.
- A. Prakash, A. Behl, E. Ohn-Bar, K. Chitta, and A. Geiger, “Exploring data aggregation in policy learning for vision-based urban autonomous driving,” in CVPR, 2020.
- K. Chitta, A. Prakash, and A. Geiger, “Neat: Neural attention fields for end-to-end autonomous driving,” in ICCV, 2021.
- P. Wu, L. Chen, H. Li, X. Jia, J. Yan, and Y. Qiao, “Policy pre-training for autonomous driving via self-supervised geometric modeling,” in ICLR, 2023.
- CARLA, “CARLA autonomous driving leaderboard.” https://leaderboard.carla.org/, 2022.
- H. Caesar, J. Kabzan, K. S. Tan, W. K. Fong, E. Wolff, A. Lang, L. Fletcher, O. Beijbom, and S. Omari, “Nuplan: A closed-loop ml-based planning benchmark for autonomous vehicles,” in CVPR Workshops, 2021.
- J. Hawke, R. Shen, C. Gurau, S. Sharma, D. Reda, N. Nikolov, P. Mazur, S. Micklethwaite, N. Griffiths, A. Shah, et al., “Urban driving with conditional imitation learning,” in ICRA, 2020.
- F. Codevilla, E. Santana, A. M. López, and A. Gaidon, “Exploring the limitations of behavior cloning for autonomous driving,” in ICCV, 2019.
- X. Liang, T. Wang, L. Yang, and E. Xing, “Cirl: Controllable imitative reinforcement learning for vision-based self-driving,” in ECCV, 2018.
- M. Toromanoff, E. Wirbel, and F. Moutarde, “End-to-end model-free reinforcement learning for urban driving using implicit affordances,” in CVPR, 2020.
- R. Chekroun, M. Toromanoff, S. Hornauer, and F. Moutarde, “Gri: General reinforced imitation and its application to vision-based autonomous driving,” arXiv.org, vol. 2111.08575, 2021.
- D. Chen, V. Koltun, and P. Krähenbühl, “Learning to drive from a world on rails,” in ICCV, 2021.
- Z. Zhang, A. Liniger, D. Dai, F. Yu, and L. Van Gool, “End-to-end urban driving by imitating a reinforcement learning coach,” in ICCV, 2021.
- P. Wu, X. Jia, L. Chen, J. Yan, H. Li, and Y. Qiao, “Trajectory-guided control prediction for end-to-end autonomous driving: A simple yet strong baseline,” in NeurIPS, 2022.
- J. Zhang, Z. Huang, and E. Ohn-Bar, “Coaching a teachable student,” in CVPR, 2023.
- Y. Pan, C.-A. Cheng, K. Saigol, K. Lee, X. Yan, E. A. Theodorou, and B. Boots, “Agile autonomous driving using end-to-end deep imitation learning,” in RSS, 2017.
- J. Zhang and K. Cho, “Query-efficient imitation learning for end-to-end simulated driving,” in AAAI, 2017.
- S. Ross, G. Gordon, and D. Bagnell, “A reduction of imitation learning and structured prediction to no-regret online learning,” in AISTATS, 2011.
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” in NeurIPS, 2017.
- K. Chitta, A. Prakash, B. Jaeger, Z. Yu, K. Renz, and A. Geiger, “Transfuser: Imitation with transformer-based sensor fusion for autonomous driving,” PAMI, 2022.
- H. Shao, L. Wang, R. Chen, H. Li, and Y. Liu, “Safety-enhanced autonomous driving using interpretable sensor fusion transformer,” in CoRL, 2022.
- X. Jia, P. Wu, L. Chen, J. Xie, C. He, J. Yan, and H. Li, “Think twice before driving: Towards scalable decoders for end-to-end autonomous driving,” in CVPR, 2023.
- B. Jaeger, K. Chitta, and A. Geiger, “Hidden biases of end-to-end driving models,” arXiv.org, vol. 2306.07957, 2023.
- W. Zeng, W. Luo, S. Suo, A. Sadat, B. Yang, S. Casas, and R. Urtasun, “End-to-end interpretable neural motion planner,” in CVPR, 2019.
- J. Kim, A. Rohrbach, T. Darrell, J. Canny, and Z. Akata, “Textual explanations for self-driving vehicles,” in ECCV, 2018.
- J. Wang, A. Pun, J. Tu, S. Manivasagam, A. Sadat, S. Casas, M. Ren, and R. Urtasun, “Advsim: Generating safety-critical scenarios for self-driving vehicles,” in CVPR, 2021.
- W. Ding, B. Chen, M. Xu, and D. Zhao, “Learning to collide: An adaptive safety-critical scenarios generating method,” in IROS, 2020.
- Q. Zhang, Z. Peng, and B. Zhou, “Learning to drive by watching youtube videos: Action-conditioned contrastive policy pretraining,” in ECCV, 2022.
- J. Zhang, R. Zhu, and E. Ohn-Bar, “Selfd: Self-learning large-scale driving policies from the web,” in CVPR, 2022.
- S. Hu, L. Chen, P. Wu, H. Li, J. Yan, and D. Tao, “St-p3: End-to-end vision-based autonomous driving via spatial-temporal feature learning,” in ECCV, 2022.
- A. Sadat, S. Casas, M. Ren, X. Wu, P. Dhawan, and R. Urtasun, “Perceive, predict, and plan: Safe motion planning through interpretable semantic representations,” in ECCV, 2020.
- J. Janai, F. Güney, A. Behl, and A. Geiger, “Computer vision for autonomous vehicles: Problems, datasets and state-of-the-art,” arXiv.org, vol. 1704.05519, 2017.
- A. Tampuu, T. Matiisen, M. Semikin, D. Fishman, and N. Muhammad, “A survey of end-to-end driving: Architectures and training methods,” TNNLS, 2020.
- S. Teng, X. Hu, P. Deng, B. Li, Y. Li, D. Yang, Y. Ai, L. Li, L. Chen, Z. Xuanyuan, et al., “Motion planning for autonomous driving: The state of the art and future perspectives,” arXiv.org, vol. 2303.09824, 2023.
- D. Coelho and M. Oliveira, “A review of end-to-end autonomous driving in urban environments,” IEEE Access, 2022.
- A. O. Ly and M. Akhloufi, “Learning to drive by imitation: An overview of deep behavior cloning methods,” TIV, 2020.
- L. Le Mero, D. Yi, M. Dianati, and A. Mouzakitis, “A survey on imitation learning techniques for end-to-end autonomous vehicles,” TITS, 2022.
- B. Zheng, S. Verma, J. Zhou, I. W. Tsang, and F. Chen, “Imitation learning: Progress, taxonomies and challenges,” TNNLS, 2022.
- Z. Zhu and H. Zhao, “A survey of deep RL and IL for autonomous driving policy learning,” TITS, 2021.
- B. R. Kiran, I. Sobh, V. Talpaert, P. Mannion, A. A. A. Sallab, S. K. Yogamani, and P. Pérez, “Deep reinforcement learning for autonomous driving: A survey,” TITS, 2021.
- M. Bain and C. Sammut, “A framework for behavioural cloning,” in Machine Intelligence 15, 1995.
- B. D. Ziebart, A. L. Maas, J. A. Bagnell, A. K. Dey, et al., “Maximum entropy inverse reinforcement learning,” in AAAI, 2008.
- Y. Lecun, E. Cosatto, J. Ben, U. Muller, and B. Flepp, “Dave: Autonomous off-road vehicle control using end-to-end learning,” Tech. Rep. DARPA-IPTO Final Report, Courant Institute/CBLL, 2004.
- D. Chen and P. Krähenbühl, “Learning from all vehicles,” in CVPR, 2022.
- K. Judah, A. P. Fern, T. G. Dietterich, and P. Tadepalli, “Active imitation learning: Formal and practical reductions to iid learning,” JMLR, 2014.
- S. Ross and D. Bagnell, “Efficient reductions for imitation learning,” in AISTATS, 2010.
- S. Ross and J. A. Bagnell, “Reinforcement and imitation learning via interactive no-regret learning,” arXiv.org, vol. 1406.5979, 2014.
- A. E. Sallab, M. Saeed, O. A. Tawab, and M. Abdou, “Meta learning framework for automated driving,” arXiv.org, vol. 1706.04038, 2017.
- C. Wen, J. Lin, T. Darrell, D. Jayaraman, and Y. Gao, “Fighting copycat agents in behavioral cloning from observation histories,” in NIPS, 2020.
- C. Wen, J. Lin, J. Qian, Y. Gao, and D. Jayaraman, “Keyframe-focused visual imitation learning,” in ICML, 2021.
- J. Park, Y. Seo, C. Liu, L. Zhao, T. Qin, J. Shin, and T.-Y. Liu, “Object-aware regularization for addressing causal confusion in imitation learning,” in NeurIPS, 2021.
- C. Wen, J. Qian, J. Lin, J. Teng, D. Jayaraman, and Y. Gao, “Fighting fire with fire: avoiding dnn shortcuts through priming,” in ICML, 2022.
- D. Brown, W. Goo, P. Nagarajan, and S. Niekum, “Extrapolating beyond suboptimal demonstrations via inverse reinforcement learning from observations,” in ICML, 2019.
- C. Finn, S. Levine, and P. Abbeel, “Guided cost learning: Deep inverse optimal control via policy optimization,” in ICML, 2016.
- S. Reddy, A. D. Dragan, and S. Levine, “Sqil: Imitation learning via reinforcement learning with sparse rewards,” arXiv.org, vol. 1905.11108, 2019.
- S. Luo, H. Kasaei, and L. Schomaker, “Self-imitation learning by planning,” in ICRA, 2021.
- J. Ho and S. Ermon, “Generative adversarial imitation learning,” in NeurIPS, 2016.
- Y. Li, J. Song, and S. Ermon, “Infogail: Interpretable imitation learning from visual demonstrations,” in NeurIPS, 2017.
- G. Lee, D. Kim, W. Oh, K. Lee, and S. Oh, “Mixgail: Autonomous driving using demonstrations with mixed qualities,” in IROS, 2020.
- I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial networks,” ACM, 2020.
- H. Wang, P. Cai, R. Fan, Y. Sun, and M. Liu, “End-to-end interactive prediction and planning with optical flow distillation for autonomous driving,” in CVPR Workshops, 2021.
- P. Hu, A. Huang, J. Dolan, D. Held, and D. Ramanan, “Safe local motion planning with self-supervised freespace forecasting,” in CVPR, 2021.
- T. Khurana, P. Hu, A. Dave, J. Ziglar, D. Held, and D. Ramanan, “Differentiable raycasting for self-supervised occupancy forecasting,” in ECCV, 2022.
- H. Wang, P. Cai, Y. Sun, L. Wang, and M. Liu, “Learning interpretable end-to-end vision-based motion planning for autonomous driving with optical flow distillation,” in ICRA, 2021.
- R. S. Sutton and A. G. Barto, “Reinforcement learning: An introduction,” TNNLS, 1998.
- V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, et al., “Human-level control through deep reinforcement learning,” Nature, 2015.
- M. G. Bellemare, Y. Naddaf, J. Veness, and M. Bowling, “The arcade learning environment: An evaluation platform for general agents,” JAIR, 2013.
- D. Horgan, J. Quan, D. Budden, G. Barth-Maron, M. Hessel, H. Van Hasselt, and D. Silver, “Distributed prioritized experience replay,” arXiv.org, vol. 1803.00933, 2018.
- A. Dosovitskiy, G. Ros, F. Codevilla, A. Lopez, and V. Koltun, “CARLA: An open urban driving simulator,” in CoRL, 2017.
- J. Bjorck, C. P. Gomes, and K. Q. Weinberger, “Towards deeper deep reinforcement learning with spectral normalization,” in NeurIPS, 2021.
- M. Toromanoff, E. Wirbel, and F. Moutarde, “Is deep reinforcement learning really superhuman on atari? leveling the playing field,” arXiv.org, vol. 1908.04683, 2019.
- E. Ohn-Bar, A. Prakash, A. Behl, K. Chitta, and A. Geiger, “Learning situational driving,” in CVPR, 2020.
- W. B. Knox, A. Allievi, H. Banzhaf, F. Schmitt, and P. Stone, “Reward (mis)design for autonomous driving,” arXiv.org, vol. 2104.13906, 2021.
- C. Zhang, R. Guo, W. Zeng, Y. Xiong, B. Dai, R. Hu, M. Ren, and R. Urtasun, “Rethinking closed-loop training for autonomous driving,” in ECCV, 2022.
- D. Hafner, T. Lillicrap, J. Ba, and M. Norouzi, “Dream to control: Learning behaviors by latent imagination,” in ICLR, 2020.
- D. Hafner, T. Lillicrap, M. Norouzi, and J. Ba, “Mastering atari with discrete world models,” in ICLR, 2021.
- D. Ha and J. Schmidhuber, “Recurrent world models facilitate policy evolution,” in NeurIPS, 2018.
- Y. Abeysirigoonawardena, F. Shkurti, and G. Dudek, “Generating adversarial driving scenarios in high-fidelity simulators,” in ICRA, 2019.
- Q. Wang, Z. Wang, K. Genova, P. P. Srinivasan, H. Zhou, J. T. Barron, R. Martin-Brualla, N. Snavely, and T. A. Funkhouser, “Ibrnet: Learning multi-view image-based rendering,” in CVPR, 2021.
- B. Wymann, E. Espié, C. Guionneau, C. Dimitrakakis, R. Coulom, and A. Sumner, “Torcs, the open racing car simulator,” Software available at http://torcs. sourceforge. net, vol. 4, no. 6, p. 2, 2000.
- M. Martinez, C. Sitawarin, K. Finch, L. Meincke, A. Yablonski, and A. Kornhauser, “Beyond grand theft auto v for training, testing and enhancing deep learning in self driving cars,” arXiv.org, vol. 1712.01397, 2017.
- P. A. Lopez, M. Behrisch, L. Bieker-Walz, J. Erdmann, Y.-P. Flötteröd, R. Hilbrich, L. Lücken, J. Rummel, P. Wagner, and E. Wießner, “Microscopic traffic simulation using sumo,” in ITSC, 2018.
- D. Team, “Deepdrive: a simulator that allows anyone with a pc to push the state-of-the-art in self-driving.” https://github.com/deepdrive/deepdrive, 2020.
- Q. Li, Z. Peng, L. Feng, Q. Zhang, Z. Xue, and B. Zhou, “Metadrive: Composing diverse driving scenarios for generalizable reinforcement learning,” PAMI, 2022.
- D. J. Fremont, T. Dreossi, S. Ghosh, X. Yue, A. L. Sangiovanni-Vincentelli, and S. A. Seshia, “Scenic: a language for scenario specification and scene generation,” in PLDI, 2019.
- F. Hauer, T. Schmidt, B. Holzmüller, and A. Pretschner, “Did we test all scenarios for automated and autonomous driving systems?,” in ITSC, 2019.
- L. Bergamini, Y. Ye, O. Scheel, L. Chen, C. Hu, L. D. Pero, B. Osinski, H. Grimmett, and P. Ondruska, “Simnet: Learning reactive self-driving simulations from real-world observations,” in ICRA, 2021.
- L. Mi, H. Zhao, C. Nash, X. Jin, J. Gao, C. Sun, C. Schmid, N. Shavit, Y. Chai, and D. Anguelov, “Hdmapgen: A hierarchical graph generative model of high definition maps,” in CVPR, 2021.
- L. Feng, Q. Li, Z. Peng, S. Tan, and B. Zhou, “Trafficgen: Learning to generate diverse and realistic traffic scenarios,” in ICRA, 2023.
- S. Tan, K. Wong, S. Wang, S. Manivasagam, M. Ren, and R. Urtasun, “Scenegen: Learning to generate realistic traffic scenes,” in CVPR, 2021.
- S. Suo, S. Regalado, S. Casas, and R. Urtasun, “Trafficsim: Learning to simulate realistic multi-agent behaviors,” in CVPR, 2021.
- M. Treiber, A. Hennecke, and D. Helbing, “Congested traffic states in empirical observations and microscopic simulations,” Physical review E, 2000.
- Z. Zhong, D. Rempe, D. Xu, Y. Chen, S. Veer, T. Che, B. Ray, and M. Pavone, “Guided conditional diffusion for controllable traffic simulation,” in ICRA, 2023.
- D. Xu, Y. Chen, B. Ivanovic, and M. Pavone, “Bits: Bi-level imitation for traffic simulation,” in ICRA, 2023.
- Z. Zhang, A. Liniger, D. Dai, F. Yu, and L. Van Gool, “TrafficBots: Towards world models for autonomous driving simulation and motion prediction,” in ICRA, 2023.
- S. Manivasagam, S. Wang, K. Wong, W. Zeng, M. Sazanovich, S. Tan, B. Yang, W. Ma, and R. Urtasun, “Lidarsim: Realistic lidar simulation by leveraging the real world,” in CVPR, 2020.
- Z. Yang, Y. Chai, D. Anguelov, Y. Zhou, P. Sun, D. Erhan, S. Rafferty, and H. Kretzschmar, “Surfelgan: Synthesizing realistic sensor data for autonomous driving,” in CVPR, 2020.
- Y. Chen, F. Rong, S. Duggal, S. Wang, X. Yan, S. Manivasagam, S. Xue, E. Yumer, and R. Urtasun, “Geosim: Realistic video simulation via geometry-aware composition for self-driving,” in CVPR, 2021.
- Z. Yang, Y. Chen, J. Wang, S. Manivasagam, W.-C. Ma, A. J. Yang, and R. Urtasun, “Unisim: A neural closed-loop sensor simulator,” in CVPR, 2023.
- A. Petrenko, E. Wijmans, B. Shacklett, and V. Koltun, “Megaverse: Simulating embodied agents at one million experiences per second,” in ICML, 2021.
- Z. Song, Z. He, X. Li, Q. Ma, R. Ming, Z. Mao, H. Pei, L. Peng, J. Hu, D. Yao, et al., “Synthetic datasets for autonomous driving: A survey,” arXiv.org, vol. 2304.12205, 2023.
- A. Amini, I. Gilitschenski, J. Phillips, J. Moseyko, R. Banerjee, S. Karaman, and D. Rus, “Learning robust control policies for end-to-end autonomous driving from data-driven simulation,” RA-L, 2020.
- A. Amini, T.-H. Wang, I. Gilitschenski, W. Schwarting, Z. Liu, S. Han, S. Karaman, and D. Rus, “Vista 2.0: An open, data-driven simulator for multimodal sensing and policy learning for autonomous vehicles,” in ICRA, 2022.
- T.-H. Wang, A. Amini, W. Schwarting, I. Gilitschenski, S. Karaman, and D. Rus, “Learning interactive driving policies via data-driven simulation,” in ICRA, 2022.
- B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoorthi, and R. Ng, “Nerf: Representing scenes as neural radiance fields for view synthesis,” in ECCV, 2020.
- H. Turki, D. Ramanan, and M. Satyanarayanan, “Mega-nerf: Scalable construction of large-scale nerfs for virtual fly-throughs,” in CVPR, 2022.
- A. Kundu, K. Genova, X. Yin, A. Fathi, C. Pantofaru, L. Guibas, A. Tagliasacchi, F. Dellaert, and T. Funkhouser, “Panoptic Neural Fields: A Semantic Object-Aware Neural Scene Representation,” in CVPR, 2022.
- Y. Yang, Y. Yang, H. Guo, R. Xiong, Y. Wang, and Y. Liao, “Urbangiraffe: Representing urban scenes as compositional generative neural feature fields,” arXiv.org, vol. 2303.14167, 2023.
- S. R. Richter, H. A. Alhaija, and V. Koltun, “Enhancing photorealism enhancement,” PAMI, 2023.
- F. Codevilla, A. M. Lopez, V. Koltun, and A. Dosovitskiy, “On offline evaluation of vision-based driving models,” in ECCV, 2018.
- D. Dauner, M. Hallgarten, A. Geiger, and K. Chitta, “Parting with misconceptions about learning-based vehicle motion planning,” arXiv.org, vol. 2306.07962, 2023.
- J.-T. Zhai, Z. Feng, J. Du, Y. Mao, J.-J. Liu, Z. Tan, Y. Zhang, X. Ye, and J. Wang, “Rethinking the open-loop evaluation of end-to-end autonomous driving in nuscenes,” arXiv.org, vol. 2305.10430, 2023.
- H. Caesar, V. Bankiti, A. H. Lang, S. Vora, V. E. Liong, Q. Xu, A. Krishnan, Y. Pan, G. Baldan, and O. Beijbom, “nuscenes: A multimodal dataset for autonomous driving,” in CVPR, 2020.
- M. Chang, J. Lambert, P. Sangkloy, J. Singh, S. Bak, A. Hartnett, D. Wang, P. Carr, S. Lucey, D. Ramanan, and J. Hays, “Argoverse: 3d tracking and forecasting with rich maps,” in CVPR, 2019.
- B. Wilson, W. Qi, T. Agarwal, J. Lambert, J. Singh, S. Khandelwal, B. Pan, R. Kumar, A. Hartnett, J. K. Pontes, D. Ramanan, P. Carr, and J. Hays, “Argoverse 2: Next generation datasets for self-driving perception and forecasting,” in NeurIPS, 2021.
- P. Sun, H. Kretzschmar, X. Dotiwalla, A. Chouard, V. Patnaik, P. Tsui, J. Guo, Y. Zhou, Y. Chai, B. Caine, V. Vasudevan, W. Han, J. Ngiam, H. Zhao, A. Timofeev, S. Ettinger, M. Krivokon, A. Gao, A. Joshi, Y. Zhang, J. Shlens, Z. Chen, and D. Anguelov, “Scalability in perception for autonomous driving: Waymo open dataset,” in CVPR, 2020.
- T. Liang, H. Xie, K. Yu, Z. Xia, Z. Lin, Y. Wang, T. Tang, B. Wang, and Z. Tang, “BEVFusion: A simple and robust liDAR-camera fusion framework,” in NeurIPS, 2022.
- Z. Liu, H. Tang, A. Amini, X. Yang, H. Mao, D. Rus, and S. Han, “Bevfusion: Multi-task multi-sensor fusion with unified bird’s-eye view representation,” arXiv.org, vol. 2205.13542, 2022.
- A. Kim, A. Ošep, and L. Leal-Taixé, “Eagermot: 3d multi-object tracking via sensor fusion,” in ICRA, 2021.
- H.-k. Chiu, J. Li, R. Ambruş, and J. Bohg, “Probabilistic 3d multi-modal, multi-object tracking for autonomous driving,” in ICRA, 2021.
- R. Zhang, S. A. Candra, K. Vetter, and A. Zakhor, “Sensor fusion for semantic segmentation of urban scenes,” in ICRA, 2015.
- G. P. Meyer, J. Charland, D. Hegde, A. Laddha, and C. Vallespi-Gonzalez, “Sensor fusion for joint 3d object detection and semantic segmentation,” in CVPR Workshops, 2019.
- Y. Xiao, F. Codevilla, A. Gurram, O. Urfalioglu, and A. M. López, “Multimodal end-to-end autonomous driving,” TITS, 2020.
- B. Zhou, P. Krähenbühl, and V. Koltun, “Does computer vision matter for action?,” Science Robotics, 2019.
- P. Cai, S. Wang, H. Wang, and M. Liu, “Carl-lead: Lidar-based end-to-end autonomous driving with contrastive deep reinforcement learning,” arXiv.org, vol. 2109.08473, 2021.
- Z. Gao, Y. Mu, R. Shen, C. Chen, Y. Ren, J. Chen, S. E. Li, P. Luo, and Y. Lu, “Enhance sample efficiency and robustness of end-to-end urban autonomous driving via semantic masked world model,” in NeurIPS Workshops, 2022.
- J. Chen, S. E. Li, and M. Tomizuka, “Interpretable end-to-end urban autonomous driving with latent deep reinforcement learning,” TITS, 2022.
- Z. Huang, C. Lv, Y. Xing, and J. Wu, “Multi-modal sensor fusion-based deep neural network for end-to-end autonomous driving with scene understanding,” IEEE Sensors Journal, 2020.
- O. Natan and J. Miura, “Fully end-to-end autonomous driving with semantic depth cloud mapping and multi-agent,” in IV, 2022.
- I. Sobh, L. Amin, S. Abdelkarim, K. Elmadawy, M. Saeed, O. Abdeltawab, M. E. Gamal, and A. E. Sallab, “End-to-end multi-modal sensors fusion system for urban automated driving,” in NeurIPS Workshops, 2018.
- Y. Chen, J. Wang, J. Li, C. Lu, Z. Luo, H. Xue, and C. Wang, “Lidar-video driving dataset: Learning driving policies effectively,” in CVPR, 2018.
- H. M. Eraqi, M. N. Moustafa, and J. Honer, “Dynamic conditional imitation learning for autonomous driving,” TITS, 2022.
- S. Chowdhuri, T. Pankaj, and K. Zipser, “Multinet: Multi-modal multi-task learning for autonomous driving,” in WACV, 2019.
- P. Cai, S. Wang, Y. Sun, and M. Liu, “Probabilistic end-to-end vehicle navigation in complex dynamic environments with multimodal sensor fusion,” RA-L, 2020.
- Q. Zhang, M. Tang, R. Geng, F. Chen, R. Xin, and L. Wang, “Mmfn: Multi-modal-fusion-net for end-to-end driving,” in IROS, 2022.
- H. Shao, L. Wang, R. Chen, S. L. Waslander, H. Li, and Y. Liu, “Reasonnet: End-to-end driving with temporal and global reasoning,” in CVPR, 2023.
- Y. Li, A. W. Yu, T. Meng, B. Caine, J. Ngiam, D. Peng, J. Shen, Y. Lu, D. Zhou, Q. V. Le, et al., “Deepfusion: Lidar-camera deep fusion for multi-modal 3d object detection,” in CVPR, 2022.
- S. Borse, M. Klingner, V. R. Kumar, H. Cai, A. Almuzairee, S. Yogamani, and F. Porikli, “X-align: Cross-modal cross-view alignment for bird’s-eye-view segmentation,” in WACV, 2023.
- P. Anderson, Q. Wu, D. Teney, J. Bruce, M. Johnson, N. Sünderhauf, I. Reid, S. Gould, and A. Van Den Hengel, “Vision-and-language navigation: Interpreting visually-grounded navigation instructions in real environments,” in CVPR, 2018.
- M. Shridhar, L. Manuelli, and D. Fox, “Cliport: What and where pathways for robotic manipulation,” in CoRL, 2022.
- J. Duan, S. Yu, H. L. Tan, H. Zhu, and C. Tan, “A survey of embodied ai: From simulators to research tasks,” TETCI, 2022.
- S. Vemprala, R. Bonatti, A. Bucker, and A. Kapoor, “Chatgpt for robotics: Design principles and model abilities,” CoRR, 2023.
- T. Deruyttere, S. Vandenhende, D. Grujicic, L. Van Gool, and M. F. Moens, “Talk2car: Taking control of your self-driving car,” in EMNLP, 2019.
- P. Mirowski, M. Grimes, M. Malinowski, K. M. Hermann, K. Anderson, D. Teplyashin, K. Simonyan, A. Zisserman, R. Hadsell, et al., “Learning to navigate in cities without a map,” in NeurIPS, 2018.
- H. Chen, A. Suhr, D. Misra, N. Snavely, and Y. Artzi, “Touchdown: Natural language navigation and spatial reasoning in visual street environments,” in CVPR, 2019.
- R. Schumann and S. Riezler, “Generating landmark navigation instructions from maps as a graph-to-text problem,” in ACL, 2021.
- J. Kim, T. Misu, Y.-T. Chen, A. Tawari, and J. Canny, “Grounding human-to-vehicle advice for self-driving vehicles,” in CVPR, 2019.
- S. Narayanan, T. Maniar, J. Kalyanasundaram, V. Gandhi, B. Bhowmick, and K. M. Krishna, “Talk to the vehicle: Language conditioned autonomous navigation of self driving cars,” in IROS, 2019.
- J. Kim, S. Moon, A. Rohrbach, T. Darrell, and J. Canny, “Advisable learning for self-driving vehicles by internalizing observation-to-action rules,” in CVPR, 2020.
- J. Roh, C. Paxton, A. Pronobis, A. Farhadi, and D. Fox, “Conditional driving from natural language instructions,” in CoRL, 2019.
- K. Jain, V. Chhangani, A. Tiwari, K. M. Krishna, and V. Gandhi, “Ground then navigate: Language-guided navigation in dynamic scenes,” arXiv.org, vol. 2209.11972, 2022.
- D. Shah, B. Osiński, b. ichter, and S. Levine, “Lm-nav: Robotic navigation with large pre-trained models of language, vision, and action,” in CoRL, 2023.
- A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger, and I. Sutskever, “Learning transferable visual models from natural language supervision,” in ICML, 2021.
- T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. M. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, and D. Amodei, “Language models are few-shot learners,” in NeurIPS, 2020.
- L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. Wainwright, P. Mishkin, C. Zhang, S. Agarwal, K. Slama, A. Ray, et al., “Training language models to follow instructions with human feedback,” in NeurIPS, 2022.
- OpenAI, “OpenAI: Introducing ChatGPT.” https://openai.com/blog/chatgpt, 2022.
- OpenAI, “GPT-4 Technical Report,” arXiv.org, vol. 2303.08774, 2023.
- B. Hilleli and R. El-Yaniv, “Toward deep reinforcement learning without a simulator: An autonomous steering example,” in AAAI, 2018.
- G. Wang, H. Niu, D. Zhu, J. Hu, X. Zhan, and G. Zhou, “A versatile and efficient reinforcement learning framework for autonomous driving,” arXiv.org, vol. 2110.11573, 2021.
- A. Behl, K. Chitta, A. Prakash, E. Ohn-Bar, and A. Geiger, “Label efficient visual abstractions for autonomous driving,” in IROS, 2020.
- S.-H. Chung, S.-H. Kong, S. Cho, and I. M. A. Nahrendra, “Segmented encoding for sim2real of rl-based end-to-end autonomous driving,” in IV, 2022.
- D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” arXiv.org, vol. 1312.6114, 2013.
- M. Ahmed, A. Abobakr, C. P. Lim, and S. Nahavandi, “Policy-based reinforcement learning for training autonomous driving agents in urban areas with affordance learning,” TITS, 2022.
- A. Sauer, N. Savinov, and A. Geiger, “Conditional affordance learning for driving in urban environments,” in CoRL, 2018.
- Z. Yuan, Z. Xue, B. Yuan, X. Wang, Y. Wu, Y. Gao, and H. Xu, “Pre-trained image encoder for generalizable visual reinforcement learning,” in NeurIPS, 2022.
- J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in CVPR, 2009.
- X. Zhang, M. Wu, H. Ma, T. Hu, and J. Yuan, “Multi-task long-range urban driving based on hierarchical planning and reinforcement learning,” in ITSC, 2021.
- C. Huang, R. Zhang, M. Ouyang, P. Wei, J. Lin, J. Su, and L. Lin, “Deductive reinforcement learning for visual autonomous urban driving navigation,” TNNLS, 2021.
- R. Cheng, C. Agia, F. Shkurti, D. Meger, and G. Dudek, “Latent attention augmentation for robust autonomous driving policies,” in IROS, 2021.
- J. Yamada, K. Pertsch, A. Gunjal, and J. J. Lim, “Task-induced representation learning,” in ICLR, 2022.
- J. Chen and S. Pan, “Learning generalizable representations for reinforcement learning via adaptive meta-learner of behavioral similarities,” in ICLR, 2022.
- J. Wu, Z. Huang, and C. Lv, “Uncertainty-aware model-based reinforcement learning: Methodology and application in autonomous driving,” IV, 2022.
- M. Pan, X. Zhu, Y. Wang, and X. Yang, “Iso-dream: Isolating and leveraging noncontrollable visual dynamics in world models,” in NeurIPS, 2022.
- A. Hu, G. Corrado, N. Griffiths, Z. Murez, C. Gurau, H. Yeo, A. Kendall, R. Cipolla, and J. Shotton, “Model-based imitation learning for urban driving,” in NeurIPS, 2022.
- R. Caruana, “Multitask learning,” Machine Learning, 1997.
- A. Argyriou, T. Evgeniou, and M. Pontil, “Multi-task feature learning,” in NeurIPS, 2006.
- K. Ishihara, A. Kanervisto, J. Miura, and V. Hautamaki, “Multi-task learning with attention for end-to-end autonomous driving,” in CVPR Workshops, 2021.
- Z. Li, T. Motoyoshi, K. Sasaki, T. Ogata, and S. Sugano, “Rethinking self-driving: Multi-task knowledge for better generalization and accident explanation ability,” arXiv.org, vol. 1809.11100, 2018.
- H. Xu, Y. Gao, F. Yu, and T. Darrell, “End-to-end learning of driving models from large-scale video datasets,” in CVPR, 2017.
- A. Mehta, A. Subramanian, and A. Subramanian, “Learning end-to-end autonomous driving using guided auxiliary supervision,” in ICVGIP, 2018.
- Y. Hou, Z. Ma, C. Liu, and C. C. Loy, “Learning to steer by mimicking features from heterogeneous auxiliary networks,” in AAAI, 2019.
- A. Zhao, T. He, Y. Liang, H. Huang, G. Van den Broeck, and S. Soatto, “Sam: Squeeze-and-mimic networks for conditional visual driving policy learning,” in CoRL, 2020.
- É. Zablocki, H. Ben-Younes, P. Pérez, and M. Cord, “Explainability of deep vision-based autonomous driving systems: Review and challenges,” IJCV, 2022.
- M. Bojarski, P. Yeres, A. Choromanska, K. Choromanski, B. Firner, L. Jackel, and U. Muller, “Explaining how a deep neural network trained with end-to-end learning steers a car,” arXiv.org, vol. 1704.07911, 2017.
- M. Bojarski, A. Choromanska, K. Choromanski, B. Firner, L. J. Ackel, U. Muller, P. Yeres, and K. Zieba, “Visualbackprop: Efficient visualization of cnns for autonomous driving,” in ICRA, 2018.
- S. Mohseni, A. Jagadeesh, and Z. Wang, “Predicting model failure using saliency maps in autonomous driving systems,” arXiv.org, vol. 1905.07679, 2019.
- J. Kim and J. Canny, “Interpretable learning for self-driving cars by visualizing causal attention,” in ICCV, 2017.
- K. Mori, H. Fukui, T. Murase, T. Hirakawa, T. Yamashita, and H. Fujiyoshi, “Visual explanation by attention branch network for end-to-end learning-based self-driving,” in IV, 2019.
- D. Wang, C. Devin, Q.-Z. Cai, F. Yu, and T. Darrell, “Deep object-centric policies for autonomous driving,” in ICRA, 2019.
- L. Cultrera, L. Seidenari, F. Becattini, P. Pala, and A. Del Bimbo, “Explaining autonomous driving by learning end-to-end visual attention,” in CVPR Workshops, 2020.
- Y. Xiao, F. Codevilla, D. P. Bustamante, and A. M. Lopez, “Scaling self-supervised end-to-end driving with multi-view attention learning,” arXiv.org, vol. 2302.03198, 2023.
- K. Renz, K. Chitta, O.-B. Mercea, A. S. Koepke, Z. Akata, and A. Geiger, “Plant: Explainable planning transformers via object-level representations,” in CoRL, 2022.
- C. Liu, Y. Chen, M. Liu, and B. E. Shi, “Using eye gaze to enhance generalization of imitation networks to unseen environments,” TNNLS, 2020.
- W. Zeng, S. Wang, R. Liao, Y. Chen, B. Yang, and R. Urtasun, “Dsdnet: Deep structured self-driving network,” in ECCV, 2020.
- A. Cui, S. Casas, A. Sadat, R. Liao, and R. Urtasun, “Lookout: Diverse multi-future prediction and planning for self-driving,” in ICCV, 2021.
- H. Ben-Younes, É. Zablocki, P. Pérez, and M. Cord, “Driving behavior explanation with multi-level fusion,” Pattern Recognition, 2022.
- Y. Xu, X. Yang, L. Gong, H.-C. Lin, T.-Y. Wu, Y. Li, and N. Vasconcelos, “Explainable object-induced action decision for autonomous vehicles,” in CVPR, 2020.
- B. Jin, X. Liu, Y. Zheng, P. Li, H. Zhao, T. Zhang, Y. Zheng, G. Zhou, and J. Liu, “Adapt: Action-aware driving caption transformer,” in ICRA, 2023.
- R. Michelmore, M. Kwiatkowska, and Y. Gal, “Evaluating uncertainty quantification in end-to-end autonomous driving control,” arXiv.org, vol. 1811.06817, 2018.
- A. Filos, P. Tigkas, R. McAllister, N. Rhinehart, S. Levine, and Y. Gal, “Can autonomous vehicles identify, recover from, and adapt to distribution shifts?,” in ICML, 2020.
- L. Tai, P. Yun, Y. Chen, C. Liu, H. Ye, and M. Liu, “Visual-based autonomous driving deployment from a stochastic and uncertainty-aware perspective,” in IROS, 2019.
- P. Cai, Y. Sun, H. Wang, and M. Liu, “Vtgnet: A vision-based trajectory generation network for autonomous vehicles in urban environments,” IV, 2020.
- R. Geirhos, J. Jacobsen, C. Michaelis, R. S. Zemel, W. Brendel, M. Bethge, and F. A. Wichmann, “Shortcut learning in deep neural networks,” Nature Machine Intelligence, 2020.
- P. de Haan, D. Jayaraman, and S. Levine, “Causal confusion in imitation learning,” in NeurIPS, 2019.
- U. Muller, J. Ben, E. Cosatto, B. Flepp, and Y. LeCun, “Off-road obstacle avoidance through end-to-end learning,” in NeurIPS, 2005.
- M. Bansal, A. Krizhevsky, and A. S. Ogale, “Chauffeurnet: Learning to drive by imitating the best and synthesizing the worst,” in RSS, 2019.
- C. Chuang, D. Yang, C. Wen, and Y. Gao, “Resolving copycat problems in visual imitation learning via residual action prediction,” in ECCV, 2022.
- L. Shen, Z. Lin, and Q. Huang, “Relay backpropagation for effective learning of deep convolutional neural networks,” in ECCV, 2016.
- M. Buda, A. Maki, and M. A. Mazurowski, “A systematic study of the class imbalance problem in convolutional neural networks,” NN, 2018.
- J. Byrd and Z. Lipton, “What is the effect of importance weighting in deep learning?,” in ICML, 2019.
- A. Gupta, P. Dollar, and R. Girshick, “Lvis: A dataset for large vocabulary instance segmentation,” in CVPR, 2019.
- J. Peng, X. Bu, M. Sun, Z. Zhang, T. Tan, and J. Yan, “Large-scale object detection in the wild from imbalanced multi-labels,” in CVPR, 2020.
- I. Mani and I. Zhang, “knn approach to unbalanced data distributions: a case study involving information extraction,” in ICML Workshops, 2003.
- X.-Y. Liu, J. Wu, and Z.-H. Zhou, “Exploratory undersampling for class-imbalance learning,” TCYB, 2008.
- D. Devi, B. Purkayastha, et al., “Redundancy-driven modified tomek-link based undersampling: A solution to class imbalance,” Pattern Recognition Letters, 2017.
- S. Gidaris and N. Komodakis, “Dynamic few-shot visual learning without forgetting,” in CVPR, 2018.
- H. Zhang, M. Cisse, Y. N. Dauphin, and D. Lopez-Paz, “mixup: Beyond empirical risk minimization,” in ICLR, 2017.
- H.-P. Chou, S.-C. Chang, J.-Y. Pan, W. Wei, and D.-C. Juan, “Remix: rebalanced mixup,” in ECCV, 2020.
- C. Huang, Y. Li, C. C. Loy, and X. Tang, “Learning deep representation for imbalanced classification,” in CVPR, 2016.
- Y.-X. Wang, D. Ramanan, and M. Hebert, “Learning to model the tail,” in NeurIPS, 2017.
- T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, “Focal loss for dense object detection,” in ICCV, 2017.
- Y. Cui, M. Jia, T.-Y. Lin, Y. Song, and S. Belongie, “Class-balanced loss based on effective number of samples,” in CVPR, 2019.
- J. Tan, C. Wang, B. Li, Q. Li, W. Ouyang, C. Yin, and J. Yan, “Equalization loss for long-tailed object recognition,” in CVPR, 2020.
- J. Tan, X. Lu, G. Zhang, C. Yin, and Q. Li, “Equalization loss v2: A new gradient balance approach for long-tailed object detection,” in CVPR, 2021.
- B. Li, Y. Yao, J. Tan, G. Zhang, F. Yu, J. Lu, and Y. Luo, “Equalized focal loss for dense long-tailed object detection,” in CVPR, 2022.
- S. Akhauri, L. Y. Zheng, and M. C. Lin, “Enhanced transfer learning for autonomous driving with systematic accident simulation,” in IROS, 2020.
- Q. Li, Z. Peng, Q. Zhang, C. Liu, and B. Zhou, “Improving the generalization of end-to-end driving through procedural generation,” arXiv.org, vol. 2012.13681, 2020.
- M. O’Kelly, A. Sinha, H. Namkoong, R. Tedrake, and J. C. Duchi, “Scalable end-to-end autonomous vehicle testing via rare-event simulation,” in NeurIPS, 2018.
- W. Ding, B. Chen, B. Li, K. J. Eun, and D. Zhao, “Multimodal safety-critical scenarios generation for decision-making algorithms evaluation,” RA-L, 2021.
- L. T. Triess, M. Dreissig, C. B. Rist, and J. M. Zöllner, “A survey on deep domain adaptation for lidar perception,” in IV Workshops, 2021.
- Y. You, X. Pan, Z. Wang, and C. Lu, “Virtual to real reinforcement learning for autonomous driving,” in BMVC, 2017.
- A. Bewley, J. Rigley, Y. Liu, J. Hawke, R. Shen, V.-D. Lam, and A. Kendall, “Learning to drive from simulation without real world labels,” in ICRA, 2019.
- J. Xing, T. Nagata, K. Chen, X. Zou, E. Neftci, and J. L. Krichmar, “Domain adaptation in reinforcement learning via latent unified state representation,” in AAAI, 2021.
- J. Tobin, R. Fong, A. Ray, J. Schneider, W. Zaremba, and P. Abbeel, “Domain randomization for transferring deep neural networks from simulation to the real world,” in IROS, 2017.
- X. B. Peng, M. Andrychowicz, W. Zaremba, and P. Abbeel, “Sim-to-real transfer of robotic control with dynamics randomization,” in ICRA, 2018.
- J. Matas, S. James, and A. J. Davison, “Sim-to-real reinforcement learning for deformable object manipulation,” in CoRL, 2018.
- B. Osiński, A. Jakubowski, P. Zięcina, P. Miłoś, C. Galias, S. Homoceanu, and H. Michalewski, “Simulation-based reinforcement learning for real-world autonomous driving,” in ICRA, 2020.
- M. Tancik, V. Casser, X. Yan, S. Pradhan, B. Mildenhall, P. P. Srinivasan, J. T. Barron, and H. Kretzschmar, “Block-nerf: Scalable large scene neural view synthesis,” in CVPR, 2022.
- P. Karkus, B. Ivanovic, S. Mannor, and M. Pavone, “Diffstack: A differentiable and modular control stack for autonomous vehicles,” in CoRL, 2022.
- A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y. Lo, P. Dollár, and R. Girshick, “Segment anything,” arXiv.org, vol. 2304.02643, 2023.
- H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. Lacroix, B. Rozière, N. Goyal, E. Hambro, F. Azhar, et al., “Llama: Open and efficient foundation language models,” arXiv.org, vol. 2302.13971, 2023.
- S. Narang and A. Chowdhery, “Pathways language model (palm): Scaling to 540 billion parameters for breakthrough performance,” 2022.
- Y. Fang, W. Wang, B. Xie, Q. Sun, L. Wu, X. Wang, T. Huang, X. Wang, and Y. Cao, “Eva: Exploring the limits of masked visual representation learning at scale,” arXiv.org, vol. 2211.07636, 2022.
- Q. Sun, Y. Fang, L. Wu, X. Wang, and Y. Cao, “Eva-clip: Improved training techniques for clip at scale,” arXiv.org, vol. 2303.15389, 2023.
- M. Oquab, T. Darcet, T. Moutakanni, H. Vo, M. Szafraniec, V. Khalidov, P. Fernandez, D. Haziza, F. Massa, A. El-Nouby, et al., “Dinov2: Learning robust visual features without supervision,” arXiv.org, vol. 2304.07193, 2023.
- J.-B. Alayrac, J. Donahue, P. Luc, A. Miech, I. Barr, Y. Hasson, K. Lenc, A. Mensch, K. Millican, M. Reynolds, et al., “Flamingo: a visual language model for few-shot learning,” in NeurIPS, 2022.
- W. Wang, J. Dai, Z. Chen, Z. Huang, Z. Li, X. Zhu, X. Hu, T. Lu, L. Lu, H. Li, et al., “Internimage: Exploring large-scale vision foundation models with deformable convolutions,” in CVPR, 2023.
- K. He, X. Chen, S. Xie, Y. Li, P. Dollár, and R. Girshick, “Masked autoencoders are scalable vision learners,” in CVPR, 2022.
- T.-H. Wang, S. Manivasagam, M. Liang, B. Yang, W. Zeng, and R. Urtasun, “V2vnet: Vehicle-to-vehicle communication for joint perception and prediction,” in ECCV, 2020.
- Y.-C. Liu, J. Tian, C.-Y. Ma, N. Glaser, C.-W. Kuo, and Z. Kira, “Who2com: Collaborative perception via learnable handshake communication,” in ICRA, 2020.
- Y.-C. Liu, J. Tian, N. Glaser, and Z. Kira, “When2com: Multi-agent perception via communication graph grouping,” in CVPR, 2020.
- J. Cui, H. Qiu, D. Chen, P. Stone, and Y. Zhu, “Coopernaut: End-to-end driving with cooperative perception for networked vehicles,” in CVPR, 2022.
- R. Xu, H. Xiang, Z. Tu, X. Xia, M.-H. Yang, and J. Ma, “V2x-vit: Vehicle-to-everything cooperative perception with vision transformer,” in ECCV, 2022.