Multi-Modal Data-Efficient 3D Scene Understanding for Autonomous Driving (2405.05258v1)

Published 8 May 2024 in cs.CV, cs.LG, and cs.RO

Abstract: Efficient data utilization is crucial for advancing 3D scene understanding in autonomous driving, where reliance on heavily human-annotated LiDAR point clouds challenges fully supervised methods. Addressing this, our study extends into semi-supervised learning for LiDAR semantic segmentation, leveraging the intrinsic spatial priors of driving scenes and multi-sensor complements to augment the efficacy of unlabeled datasets. We introduce LaserMix++, an evolved framework that integrates laser beam manipulations from disparate LiDAR scans and incorporates LiDAR-camera correspondences to further assist data-efficient learning. Our framework is tailored to enhance 3D scene consistency regularization by incorporating multi-modality, including 1) multi-modal LaserMix operation for fine-grained cross-sensor interactions; 2) camera-to-LiDAR feature distillation that enhances LiDAR feature learning; and 3) language-driven knowledge guidance generating auxiliary supervisions using open-vocabulary models. The versatility of LaserMix++ enables applications across LiDAR representations, establishing it as a universally applicable solution. Our framework is rigorously validated through theoretical analysis and extensive experiments on popular driving perception datasets. Results demonstrate that LaserMix++ markedly outperforms fully supervised alternatives, achieving comparable accuracy with five times fewer annotations and significantly improving the supervised-only baselines. This substantial advancement underscores the potential of semi-supervised approaches in reducing the reliance on extensive labeled data in LiDAR-based 3D scene understanding systems.

References (110)

Authors (8)

Lingdong Kong (49 papers)
Xiang Xu (83 papers)
Jiawei Ren (33 papers)
Wenwei Zhang (77 papers)
Liang Pan (93 papers)
Kai Chen (512 papers)
Wei Tsang Ooi (26 papers)
Ziwei Liu (368 papers)

Citations (8)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Multi-Modal Data-Efficient 3D Scene Understanding for Autonomous Driving (2405.05258v1)

Summary

Related Papers