UFO: Uncertainty-aware LiDAR-image Fusion for Off-road Semantic Terrain Map Estimation (2403.02642v1)

Published 5 Mar 2024 in cs.RO and cs.CV

Abstract: Autonomous off-road navigation requires an accurate semantic understanding of the environment, often converted into a bird's-eye view (BEV) representation for various downstream tasks. While learning-based methods have shown success in generating local semantic terrain maps directly from sensor data, their efficacy in off-road environments is hindered by challenges in accurately representing uncertain terrain features. This paper presents a learning-based fusion method for generating dense terrain classification maps in BEV. By performing LiDAR-image fusion at multiple scales, our approach enhances the accuracy of semantic maps generated from an RGB image and a single-sweep LiDAR scan. Utilizing uncertainty-aware pseudo-labels further enhances the network's ability to learn reliably in off-road environments without requiring precise 3D annotations. By conducting thorough experiments using off-road driving datasets, we demonstrate that our method can improve accuracy in off-road terrains, validating its efficacy in facilitating reliable and safe autonomous navigation in challenging off-road settings.

References (46)

Citations (2)

View on Semantic Scholar

Summary

The paper presents a novel uncertainty-aware LiDAR-image fusion framework that integrates multi-scale features for accurate off-road semantic terrain mapping.
It employs pseudo-label generation with uncertainty estimation to refine classification without the need for dense 3D annotations.
Experiments on the RELLIS-3D dataset demonstrate superior mIoU and robust performance in complex off-road environments.

Uncertainty-aware LiDAR-image Fusion for Semantic Terrain Map Estimation in Off-road Environments

This paper proposes a novel approach for generating semantic terrain maps in bird’s-eye view (BEV) for autonomous navigation in unstructured off-road settings. The method incorporates uncertainty-aware multi-modal data fusion, utilizing both LiDAR and RGB camera inputs to improve the precision and reliability of semantic classification maps without requiring precise 3D annotations. By leveraging uncertainty-aware pseudo-labels, the framework addresses the inherent variability and complex geometric characteristics of off-road environments.

Methodology

The core of the methodology lies in the fusion of LiDAR and image data to enhance the semantic terrain map estimation. Key aspects of the proposed method include:

Multi-scale LiDAR-image Fusion: The approach integrates features from LiDAR point clouds and RGB images at multiple scales, employing an attentive fusion strategy to effectively combine the spatial richness of visual data with the geometric accuracy of LiDAR measurements. This fusion is designed to enhance the representation and classification of diverse terrain features which are commonly encountered in off-road environments.
Pseudo-label Generation with Uncertainty Estimation: Instead of relying on manual dense labeling, which is costly and labor-intensive, the method generates pseudo-labels through pre-trained image segmentation models. These pseudo-labels are refined using uncertainty estimation to gauge the consistency of label predictions across multiple temporal frames. This provides a measure of confidence for each grid cell in the semantic BEV map, and aids in the training process by adjusting the weight of less certain labels in the loss function.
Network Architecture: The BEV semantic fusion network is built on a layered 3D U-Net architecture that processes LiDAR and image features separately before fusing them. A subsequent 2D convolutional network refines the BEV feature map to generate the final semantic terrain classification.

Experimental Results and Implications

The proposed approach has been rigorously validated on the RELLIS-3D dataset, which is tailored for off-road environments. Numerical results demonstrate superior accuracy in classification and mean Intersection over Union (mIoU) compared to existing methods like PyrOccNet and BEVNet. Specifically, the method achieves an mIoU of 35.8% along with strong classification performance across challenging classes such as dirt roads and vegetation.

The improvements are particularly notable for terrain types characterized by significant intra-class variation and complex boundary geometries, a testament to the effectiveness of the fusion strategy. The paper's results suggest that uncertainty-aware fusion significantly enhances robustness and accuracy in semantic terrain mapping, offering potentially safer and more reliable autonomous navigation in off-road scenarios.

Future Directions

This work opens several avenues for future research. The exploration of more sophisticated fusion strategies, potentially incorporating transformer-based architectures, could further enhance semantic understanding. Moreover, the integration of additional modalities such as radar could be investigated to further improve the reliability of terrain mapping under challenging atmospheric conditions. Furthermore, addressing domain adaptation challenges to deploy the model across varied geographical terrains could be another promising research front, enhancing the generalizability and applicability of the proposed method.

In conclusion, the paper presents a compelling method that advances semantic terrain map estimation in off-road environments through innovative use of sensor fusion and uncertainty quantification. This aligns with broader research goals of enhancing autonomous navigation capabilities across diverse and unstructured natural terrains.