Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

UFO: Uncertainty-aware LiDAR-image Fusion for Off-road Semantic Terrain Map Estimation (2403.02642v1)

Published 5 Mar 2024 in cs.RO and cs.CV

Abstract: Autonomous off-road navigation requires an accurate semantic understanding of the environment, often converted into a bird's-eye view (BEV) representation for various downstream tasks. While learning-based methods have shown success in generating local semantic terrain maps directly from sensor data, their efficacy in off-road environments is hindered by challenges in accurately representing uncertain terrain features. This paper presents a learning-based fusion method for generating dense terrain classification maps in BEV. By performing LiDAR-image fusion at multiple scales, our approach enhances the accuracy of semantic maps generated from an RGB image and a single-sweep LiDAR scan. Utilizing uncertainty-aware pseudo-labels further enhances the network's ability to learn reliably in off-road environments without requiring precise 3D annotations. By conducting thorough experiments using off-road driving datasets, we demonstrate that our method can improve accuracy in off-road terrains, validating its efficacy in facilitating reliable and safe autonomous navigation in challenging off-road settings.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (46)
  1. D. Maturana, P.-W. Chou, M. Uenoyama, and S. Scherer, “Real-time semantic mapping for autonomous off-road navigation,” in Field and Service Robotics: Results of the 11th International Conference.   Springer, 2018, pp. 335–350.
  2. J. Seo, T. Kim, K. Kwak, J. Min, and I. Shim, “Scate: A scalable framework for self-supervised traversability estimation in unstructured environments,” IEEE Robotics and Automation Letters, vol. 8, no. 2, pp. 888–895, 2023.
  3. X. Cai, M. Everett, J. Fink, and J. P. How, “Risk-aware off-road navigation via a learned speed distribution map,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2022, pp. 2931–2937.
  4. M. V. Gasparino, A. N. Sivakumar, Y. Liu, A. E. Velasquez, V. A. Higuti, J. Rogers, H. Tran, and G. Chowdhary, “Wayfast: Navigation with predictive traversability in the field,” IEEE Robotics and Automation Letters, vol. 7, no. 4, pp. 10 651–10 658, 2022.
  5. T. Guan, D. Kothandaraman, R. Chandra, A. J. Sathyamoorthy, K. Weerakoon, and D. Manocha, “Ga-nav: Efficient terrain segmentation for robot navigation in unstructured outdoor environments,” IEEE Robotics and Automation Letters, vol. 7, no. 3, pp. 8138–8145, 2022.
  6. M. Wigness, S. Eum, J. G. Rogers, D. Han, and H. Kwon, “A rugd dataset for autonomous navigation and visual perception in unstructured outdoor environments,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2019, pp. 5000–5007.
  7. J. Seo, S. Sim, and I. Shim, “Learning off-road terrain traversability with self-supervisions only,” IEEE Robotics and Automation Letters, vol. 8, no. 8, pp. 4617–4624, 2023.
  8. X. Cai, M. Everett, L. Sharma, P. R. Osteen, and J. P. How, “Probabilistic traversability model for risk-aware motion planning in off-road environments,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2023, pp. 11 297–11 304.
  9. X. Meng, N. Hatch, A. Lambert, A. Li, N. Wagener, M. Schmittle, J. Lee, W. Yuan, Z. Chen, S. Deng et al., “Terrainnet: Visual modeling of complex terrain for high-speed, off-road navigation,” Robotics: Science and Systems (RSS), 2023.
  10. Y. Han, J. Banfi, and M. Campbell, “Planning paths through unknown space by imagining what lies therein,” in Conference on Robot Learning (CoRL).   PMLR, 2021, pp. 905–914.
  11. A. Shaban, X. Meng, J. Lee, B. Boots, and D. Fox, “Semantic terrain classification for off-road autonomous driving,” in Conference on Robot Learning (CoRL), 2022, pp. 619–629.
  12. J. Philion and S. Fidler, “Lift, splat, shoot: Encoding images from arbitrary camera rigs by implicitly unprojecting to 3d,” in European Conference on Computer Vision (ECCV), 2020.
  13. T. Roddick and R. Cipolla, “Predicting semantic map representations from images using pyramid occupancy networks,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 11 138–11 147.
  14. Z. Li, W. Wang, H. Li, E. Xie, C. Sima, T. Lu, Y. Qiao, and J. Dai, “Bevformer: Learning bird’s-eye-view representation from multi-camera images via spatiotemporal transformers,” in European Conference on Computer Vision (ECCV).   Springer, 2022, pp. 1–18.
  15. A. Saha, O. Mendez, C. Russell, and R. Bowden, “Translating images into maps,” in International Conference on Robotics and Automation (ICRA).   IEEE, 2022, pp. 9200–9206.
  16. Z. Liu, H. Tang, A. Amini, X. Yang, H. Mao, D. L. Rus, and S. Han, “Bevfusion: Multi-task multi-sensor fusion with unified bird’s-eye view representation,” in IEEE International Conference on Robotics and Automation (ICRA), 2023, pp. 2774–2781.
  17. P. Jiang, P. Osteen, M. Wigness, and S. Saripalli, “Rellis-3d dataset: Data, benchmarks and analysis,” in IEEE International Conference on Robotics and Automation (ICRA), 2021, pp. 1110–1116.
  18. L. Gan, R. Zhang, J. W. Grizzle, R. M. Eustice, and M. Ghaffari, “Bayesian spatial kernel smoothing for scalable dense semantic mapping,” IEEE Robotics and Automation Letters, vol. 5, no. 2, pp. 790–797, 2020.
  19. K. Doherty, T. Shan, J. Wang, and B. Englot, “Learning-aided 3-d occupancy mapping with bayesian generalized kernel inference,” IEEE Transactions on Robotics, vol. 35, no. 4, pp. 953–966, 2019.
  20. J. Seo, T. Kim, S. Ahn, and K. Kwak, “Metaverse: Meta-learning traversability cost map for off-road navigation,” arXiv preprint arXiv:2307.13991, 2023.
  21. J. Wilson, Y. Fu, A. Zhang, J. Song, A. Capodieci, P. Jayakumar, K. Barton, and M. Ghaffari, “Convolutional bayesian kernel inference for 3d semantic mapping,” in IEEE International Conference on Robotics and Automation (ICRA), 2023, pp. 8364–8370.
  22. J. Wilson, J. Song, Y. Fu, A. Zhang, A. Capodieci, P. Jayakumar, K. Barton, and M. Ghaffari, “Motionsc: Data set and network for real-time semantic mapping in dynamic environments,” IEEE Robotics and Automation Letters, vol. 7, no. 3, pp. 8439–8446, 2022.
  23. M. Stölzle, T. Miki, L. Gerdes, M. Azkarate, and M. Hutter, “Reconstructing occluded elevation information in terrain maps with self-supervised learning,” IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 1697–1704, 2022.
  24. T. Miki, L. Wellhausen, R. Grandia, F. Jenelten, T. Homberger, and M. Hutter, “Elevation mapping for locomotion and navigation using gpu,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2022, pp. 2273–2280.
  25. A. W. Harley, Z. Fang, J. Li, R. Ambrus, and K. Fragkiadaki, “Simple-bev: What really matters for multi-sensor bev perception?” in IEEE International Conference on Robotics and Automation (ICRA), 2023, pp. 2759–2765.
  26. J. Fei, K. Peng, P. Heidenreich, F. Bieder, and C. Stiller, “Pillarsegnet: Pillar-based semantic grid map estimation using sparse lidar data,” in IEEE Intelligent Vehicles Symposium (IV).   IEEE, 2021, pp. 838–844.
  27. K. Peng, J. Fei, K. Yang, A. Roitberg, J. Zhang, F. Bieder, P. Heidenreich, C. Stiller, and R. Stiefelhagen, “Mass: Multi-attentional semantic segmentation of lidar data for dense top-view understanding,” IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 9, pp. 15 824–15 840, 2022.
  28. R. Cheng, C. Agia, Y. Ren, X. Li, and L. Bingbing, “S3cnet: A sparse semantic scene completion network for lidar point clouds,” in Conference on Robot Learning (CoRL), 2021, pp. 2148–2161.
  29. Z. Xia, Y. Liu, X. Li, X. Zhu, Y. Ma, Y. Li, Y. Hou, and Y. Qiao, “Scpnet: Semantic scene completion on point cloud,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023, pp. 17 642–17 651.
  30. M. Liang, B. Yang, S. Wang, and R. Urtasun, “Deep continuous fusion for multi-sensor 3d object detection,” in European Conference on Computer Vision (ECCV), 2018, pp. 641–656.
  31. Z. Chen, J. Zhang, and D. Tao, “Progressive lidar adaptation for road detection,” IEEE/CAA Journal of Automatica Sinica, vol. 6, no. 3, pp. 693–702, 2019.
  32. S. Pang, D. Morris, and H. Radha, “Clocs: Camera-lidar object candidates fusion for 3d object detection,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2020, pp. 10 386–10 393.
  33. Z. Zhuang, R. Li, K. Jia, Q. Wang, Y. Li, and M. Tan, “Perception-aware multi-sensor fusion for 3d lidar semantic segmentation,” in IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 16 280–16 290.
  34. Y. Li, A. W. Yu, T. Meng, B. Caine, J. Ngiam, D. Peng, J. Shen, Y. Lu, D. Zhou, Q. V. Le et al., “Deepfusion: Lidar-camera deep fusion for multi-modal 3d object detection,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 17 182–17 191.
  35. X. Bai, Z. Hu, X. Zhu, Q. Huang, Y. Chen, H. Fu, and C.-L. Tai, “Transfusion: Robust lidar-camera fusion for 3d object detection with transformers,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 1090–1099.
  36. S. Vora, A. H. Lang, B. Helou, and O. Beijbom, “Pointpainting: Sequential fusion for 3d object detection,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 4604–4612.
  37. C. Wang, C. Ma, M. Zhu, and X. Yang, “Pointaugmenting: Cross-modal augmentation for 3d object detection,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 11 794–11 803.
  38. A. Piergiovanni, V. Casser, M. S. Ryoo, and A. Angelova, “4d-net for learned multi-modal alignment,” in IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 15 435–15 445.
  39. D. Peng, Y. Lei, W. Li, P. Zhang, and Y. Guo, “Sparse-to-dense feature matching: Intra and inter domain cross-modal learning in domain adaptation for 3d semantic segmentation,” in IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 7108–7117.
  40. X. Yan, J. Gao, C. Zheng, C. Zheng, R. Zhang, S. Cui, and Z. Li, “2dpass: 2d priors assisted semantic segmentation on lidar point clouds,” in European Conference on Computer Vision (ECCV), 2022, pp. 677–695.
  41. T. Shan, B. Englot, D. Meyers, W. Wang, C. Ratti, and D. Rus, “Lio-sam: Tightly-coupled lidar inertial odometry via smoothing and mapping,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2020, pp. 5135–5142.
  42. S. Ye, D. Chen, S. Han, and J. Liao, “Learning with noisy labels for robust point cloud segmentation,” in IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 6443–6452.
  43. C. R. Qi, H. Su, K. Mo, and L. J. Guibas, “Pointnet: Deep learning on point sets for 3d classification and segmentation,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 652–660.
  44. W. Hess, D. Kohler, H. Rapp, and D. Andor, “Real-time loop closure in 2d lidar slam,” in IEEE International Conference on Robotics and Automation (ICRA), 2016, pp. 1271–1278.
  45. S. Matsuzaki, H. Masuzawa, and J. Miura, “Multi-source soft pseudo-label learning with domain similarity-based weighting for semantic segmentation,” arXiv preprint arXiv:2303.00979, 2023.
  46. M. Jeon, J. Seo, and J. Min, “Da-raw: Domain adaptive object detection for real-world adverse weather conditions,” arXiv preprint arXiv:2309.08152, 2023.
Citations (2)

Summary

  • The paper presents a novel uncertainty-aware LiDAR-image fusion framework that integrates multi-scale features for accurate off-road semantic terrain mapping.
  • It employs pseudo-label generation with uncertainty estimation to refine classification without the need for dense 3D annotations.
  • Experiments on the RELLIS-3D dataset demonstrate superior mIoU and robust performance in complex off-road environments.

Uncertainty-aware LiDAR-image Fusion for Semantic Terrain Map Estimation in Off-road Environments

This paper proposes a novel approach for generating semantic terrain maps in bird’s-eye view (BEV) for autonomous navigation in unstructured off-road settings. The method incorporates uncertainty-aware multi-modal data fusion, utilizing both LiDAR and RGB camera inputs to improve the precision and reliability of semantic classification maps without requiring precise 3D annotations. By leveraging uncertainty-aware pseudo-labels, the framework addresses the inherent variability and complex geometric characteristics of off-road environments.

Methodology

The core of the methodology lies in the fusion of LiDAR and image data to enhance the semantic terrain map estimation. Key aspects of the proposed method include:

  • Multi-scale LiDAR-image Fusion: The approach integrates features from LiDAR point clouds and RGB images at multiple scales, employing an attentive fusion strategy to effectively combine the spatial richness of visual data with the geometric accuracy of LiDAR measurements. This fusion is designed to enhance the representation and classification of diverse terrain features which are commonly encountered in off-road environments.
  • Pseudo-label Generation with Uncertainty Estimation: Instead of relying on manual dense labeling, which is costly and labor-intensive, the method generates pseudo-labels through pre-trained image segmentation models. These pseudo-labels are refined using uncertainty estimation to gauge the consistency of label predictions across multiple temporal frames. This provides a measure of confidence for each grid cell in the semantic BEV map, and aids in the training process by adjusting the weight of less certain labels in the loss function.
  • Network Architecture: The BEV semantic fusion network is built on a layered 3D U-Net architecture that processes LiDAR and image features separately before fusing them. A subsequent 2D convolutional network refines the BEV feature map to generate the final semantic terrain classification.

Experimental Results and Implications

The proposed approach has been rigorously validated on the RELLIS-3D dataset, which is tailored for off-road environments. Numerical results demonstrate superior accuracy in classification and mean Intersection over Union (mIoU) compared to existing methods like PyrOccNet and BEVNet. Specifically, the method achieves an mIoU of 35.8% along with strong classification performance across challenging classes such as dirt roads and vegetation.

The improvements are particularly notable for terrain types characterized by significant intra-class variation and complex boundary geometries, a testament to the effectiveness of the fusion strategy. The paper's results suggest that uncertainty-aware fusion significantly enhances robustness and accuracy in semantic terrain mapping, offering potentially safer and more reliable autonomous navigation in off-road scenarios.

Future Directions

This work opens several avenues for future research. The exploration of more sophisticated fusion strategies, potentially incorporating transformer-based architectures, could further enhance semantic understanding. Moreover, the integration of additional modalities such as radar could be investigated to further improve the reliability of terrain mapping under challenging atmospheric conditions. Furthermore, addressing domain adaptation challenges to deploy the model across varied geographical terrains could be another promising research front, enhancing the generalizability and applicability of the proposed method.

In conclusion, the paper presents a compelling method that advances semantic terrain map estimation in off-road environments through innovative use of sensor fusion and uncertainty quantification. This aligns with broader research goals of enhancing autonomous navigation capabilities across diverse and unstructured natural terrains.

Youtube Logo Streamline Icon: https://streamlinehq.com