UFO: Uncertainty-aware LiDAR-image Fusion for Off-road Semantic Terrain Map Estimation (2403.02642v1)
Abstract: Autonomous off-road navigation requires an accurate semantic understanding of the environment, often converted into a bird's-eye view (BEV) representation for various downstream tasks. While learning-based methods have shown success in generating local semantic terrain maps directly from sensor data, their efficacy in off-road environments is hindered by challenges in accurately representing uncertain terrain features. This paper presents a learning-based fusion method for generating dense terrain classification maps in BEV. By performing LiDAR-image fusion at multiple scales, our approach enhances the accuracy of semantic maps generated from an RGB image and a single-sweep LiDAR scan. Utilizing uncertainty-aware pseudo-labels further enhances the network's ability to learn reliably in off-road environments without requiring precise 3D annotations. By conducting thorough experiments using off-road driving datasets, we demonstrate that our method can improve accuracy in off-road terrains, validating its efficacy in facilitating reliable and safe autonomous navigation in challenging off-road settings.
- D. Maturana, P.-W. Chou, M. Uenoyama, and S. Scherer, “Real-time semantic mapping for autonomous off-road navigation,” in Field and Service Robotics: Results of the 11th International Conference. Springer, 2018, pp. 335–350.
- J. Seo, T. Kim, K. Kwak, J. Min, and I. Shim, “Scate: A scalable framework for self-supervised traversability estimation in unstructured environments,” IEEE Robotics and Automation Letters, vol. 8, no. 2, pp. 888–895, 2023.
- X. Cai, M. Everett, J. Fink, and J. P. How, “Risk-aware off-road navigation via a learned speed distribution map,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2022, pp. 2931–2937.
- M. V. Gasparino, A. N. Sivakumar, Y. Liu, A. E. Velasquez, V. A. Higuti, J. Rogers, H. Tran, and G. Chowdhary, “Wayfast: Navigation with predictive traversability in the field,” IEEE Robotics and Automation Letters, vol. 7, no. 4, pp. 10 651–10 658, 2022.
- T. Guan, D. Kothandaraman, R. Chandra, A. J. Sathyamoorthy, K. Weerakoon, and D. Manocha, “Ga-nav: Efficient terrain segmentation for robot navigation in unstructured outdoor environments,” IEEE Robotics and Automation Letters, vol. 7, no. 3, pp. 8138–8145, 2022.
- M. Wigness, S. Eum, J. G. Rogers, D. Han, and H. Kwon, “A rugd dataset for autonomous navigation and visual perception in unstructured outdoor environments,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2019, pp. 5000–5007.
- J. Seo, S. Sim, and I. Shim, “Learning off-road terrain traversability with self-supervisions only,” IEEE Robotics and Automation Letters, vol. 8, no. 8, pp. 4617–4624, 2023.
- X. Cai, M. Everett, L. Sharma, P. R. Osteen, and J. P. How, “Probabilistic traversability model for risk-aware motion planning in off-road environments,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2023, pp. 11 297–11 304.
- X. Meng, N. Hatch, A. Lambert, A. Li, N. Wagener, M. Schmittle, J. Lee, W. Yuan, Z. Chen, S. Deng et al., “Terrainnet: Visual modeling of complex terrain for high-speed, off-road navigation,” Robotics: Science and Systems (RSS), 2023.
- Y. Han, J. Banfi, and M. Campbell, “Planning paths through unknown space by imagining what lies therein,” in Conference on Robot Learning (CoRL). PMLR, 2021, pp. 905–914.
- A. Shaban, X. Meng, J. Lee, B. Boots, and D. Fox, “Semantic terrain classification for off-road autonomous driving,” in Conference on Robot Learning (CoRL), 2022, pp. 619–629.
- J. Philion and S. Fidler, “Lift, splat, shoot: Encoding images from arbitrary camera rigs by implicitly unprojecting to 3d,” in European Conference on Computer Vision (ECCV), 2020.
- T. Roddick and R. Cipolla, “Predicting semantic map representations from images using pyramid occupancy networks,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 11 138–11 147.
- Z. Li, W. Wang, H. Li, E. Xie, C. Sima, T. Lu, Y. Qiao, and J. Dai, “Bevformer: Learning bird’s-eye-view representation from multi-camera images via spatiotemporal transformers,” in European Conference on Computer Vision (ECCV). Springer, 2022, pp. 1–18.
- A. Saha, O. Mendez, C. Russell, and R. Bowden, “Translating images into maps,” in International Conference on Robotics and Automation (ICRA). IEEE, 2022, pp. 9200–9206.
- Z. Liu, H. Tang, A. Amini, X. Yang, H. Mao, D. L. Rus, and S. Han, “Bevfusion: Multi-task multi-sensor fusion with unified bird’s-eye view representation,” in IEEE International Conference on Robotics and Automation (ICRA), 2023, pp. 2774–2781.
- P. Jiang, P. Osteen, M. Wigness, and S. Saripalli, “Rellis-3d dataset: Data, benchmarks and analysis,” in IEEE International Conference on Robotics and Automation (ICRA), 2021, pp. 1110–1116.
- L. Gan, R. Zhang, J. W. Grizzle, R. M. Eustice, and M. Ghaffari, “Bayesian spatial kernel smoothing for scalable dense semantic mapping,” IEEE Robotics and Automation Letters, vol. 5, no. 2, pp. 790–797, 2020.
- K. Doherty, T. Shan, J. Wang, and B. Englot, “Learning-aided 3-d occupancy mapping with bayesian generalized kernel inference,” IEEE Transactions on Robotics, vol. 35, no. 4, pp. 953–966, 2019.
- J. Seo, T. Kim, S. Ahn, and K. Kwak, “Metaverse: Meta-learning traversability cost map for off-road navigation,” arXiv preprint arXiv:2307.13991, 2023.
- J. Wilson, Y. Fu, A. Zhang, J. Song, A. Capodieci, P. Jayakumar, K. Barton, and M. Ghaffari, “Convolutional bayesian kernel inference for 3d semantic mapping,” in IEEE International Conference on Robotics and Automation (ICRA), 2023, pp. 8364–8370.
- J. Wilson, J. Song, Y. Fu, A. Zhang, A. Capodieci, P. Jayakumar, K. Barton, and M. Ghaffari, “Motionsc: Data set and network for real-time semantic mapping in dynamic environments,” IEEE Robotics and Automation Letters, vol. 7, no. 3, pp. 8439–8446, 2022.
- M. Stölzle, T. Miki, L. Gerdes, M. Azkarate, and M. Hutter, “Reconstructing occluded elevation information in terrain maps with self-supervised learning,” IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 1697–1704, 2022.
- T. Miki, L. Wellhausen, R. Grandia, F. Jenelten, T. Homberger, and M. Hutter, “Elevation mapping for locomotion and navigation using gpu,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2022, pp. 2273–2280.
- A. W. Harley, Z. Fang, J. Li, R. Ambrus, and K. Fragkiadaki, “Simple-bev: What really matters for multi-sensor bev perception?” in IEEE International Conference on Robotics and Automation (ICRA), 2023, pp. 2759–2765.
- J. Fei, K. Peng, P. Heidenreich, F. Bieder, and C. Stiller, “Pillarsegnet: Pillar-based semantic grid map estimation using sparse lidar data,” in IEEE Intelligent Vehicles Symposium (IV). IEEE, 2021, pp. 838–844.
- K. Peng, J. Fei, K. Yang, A. Roitberg, J. Zhang, F. Bieder, P. Heidenreich, C. Stiller, and R. Stiefelhagen, “Mass: Multi-attentional semantic segmentation of lidar data for dense top-view understanding,” IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 9, pp. 15 824–15 840, 2022.
- R. Cheng, C. Agia, Y. Ren, X. Li, and L. Bingbing, “S3cnet: A sparse semantic scene completion network for lidar point clouds,” in Conference on Robot Learning (CoRL), 2021, pp. 2148–2161.
- Z. Xia, Y. Liu, X. Li, X. Zhu, Y. Ma, Y. Li, Y. Hou, and Y. Qiao, “Scpnet: Semantic scene completion on point cloud,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023, pp. 17 642–17 651.
- M. Liang, B. Yang, S. Wang, and R. Urtasun, “Deep continuous fusion for multi-sensor 3d object detection,” in European Conference on Computer Vision (ECCV), 2018, pp. 641–656.
- Z. Chen, J. Zhang, and D. Tao, “Progressive lidar adaptation for road detection,” IEEE/CAA Journal of Automatica Sinica, vol. 6, no. 3, pp. 693–702, 2019.
- S. Pang, D. Morris, and H. Radha, “Clocs: Camera-lidar object candidates fusion for 3d object detection,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2020, pp. 10 386–10 393.
- Z. Zhuang, R. Li, K. Jia, Q. Wang, Y. Li, and M. Tan, “Perception-aware multi-sensor fusion for 3d lidar semantic segmentation,” in IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 16 280–16 290.
- Y. Li, A. W. Yu, T. Meng, B. Caine, J. Ngiam, D. Peng, J. Shen, Y. Lu, D. Zhou, Q. V. Le et al., “Deepfusion: Lidar-camera deep fusion for multi-modal 3d object detection,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 17 182–17 191.
- X. Bai, Z. Hu, X. Zhu, Q. Huang, Y. Chen, H. Fu, and C.-L. Tai, “Transfusion: Robust lidar-camera fusion for 3d object detection with transformers,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 1090–1099.
- S. Vora, A. H. Lang, B. Helou, and O. Beijbom, “Pointpainting: Sequential fusion for 3d object detection,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 4604–4612.
- C. Wang, C. Ma, M. Zhu, and X. Yang, “Pointaugmenting: Cross-modal augmentation for 3d object detection,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 11 794–11 803.
- A. Piergiovanni, V. Casser, M. S. Ryoo, and A. Angelova, “4d-net for learned multi-modal alignment,” in IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 15 435–15 445.
- D. Peng, Y. Lei, W. Li, P. Zhang, and Y. Guo, “Sparse-to-dense feature matching: Intra and inter domain cross-modal learning in domain adaptation for 3d semantic segmentation,” in IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 7108–7117.
- X. Yan, J. Gao, C. Zheng, C. Zheng, R. Zhang, S. Cui, and Z. Li, “2dpass: 2d priors assisted semantic segmentation on lidar point clouds,” in European Conference on Computer Vision (ECCV), 2022, pp. 677–695.
- T. Shan, B. Englot, D. Meyers, W. Wang, C. Ratti, and D. Rus, “Lio-sam: Tightly-coupled lidar inertial odometry via smoothing and mapping,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2020, pp. 5135–5142.
- S. Ye, D. Chen, S. Han, and J. Liao, “Learning with noisy labels for robust point cloud segmentation,” in IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 6443–6452.
- C. R. Qi, H. Su, K. Mo, and L. J. Guibas, “Pointnet: Deep learning on point sets for 3d classification and segmentation,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 652–660.
- W. Hess, D. Kohler, H. Rapp, and D. Andor, “Real-time loop closure in 2d lidar slam,” in IEEE International Conference on Robotics and Automation (ICRA), 2016, pp. 1271–1278.
- S. Matsuzaki, H. Masuzawa, and J. Miura, “Multi-source soft pseudo-label learning with domain similarity-based weighting for semantic segmentation,” arXiv preprint arXiv:2303.00979, 2023.
- M. Jeon, J. Seo, and J. Min, “Da-raw: Domain adaptive object detection for real-world adverse weather conditions,” arXiv preprint arXiv:2309.08152, 2023.