Multi-Modal Data-Efficient 3D Scene Understanding for Autonomous Driving (2405.05258v1)
Abstract: Efficient data utilization is crucial for advancing 3D scene understanding in autonomous driving, where reliance on heavily human-annotated LiDAR point clouds challenges fully supervised methods. Addressing this, our study extends into semi-supervised learning for LiDAR semantic segmentation, leveraging the intrinsic spatial priors of driving scenes and multi-sensor complements to augment the efficacy of unlabeled datasets. We introduce LaserMix++, an evolved framework that integrates laser beam manipulations from disparate LiDAR scans and incorporates LiDAR-camera correspondences to further assist data-efficient learning. Our framework is tailored to enhance 3D scene consistency regularization by incorporating multi-modality, including 1) multi-modal LaserMix operation for fine-grained cross-sensor interactions; 2) camera-to-LiDAR feature distillation that enhances LiDAR feature learning; and 3) language-driven knowledge guidance generating auxiliary supervisions using open-vocabulary models. The versatility of LaserMix++ enables applications across LiDAR representations, establishing it as a universally applicable solution. Our framework is rigorously validated through theoretical analysis and extensive experiments on popular driving perception datasets. Results demonstrate that LaserMix++ markedly outperforms fully supervised alternatives, achieving comparable accuracy with five times fewer annotations and significantly improving the supervised-only baselines. This substantial advancement underscores the potential of semi-supervised approaches in reducing the reliance on extensive labeled data in LiDAR-based 3D scene understanding systems.
- R. Roriz, J. Cabral, and T. Gomes, “Automotive lidar technology: A survey,” IEEE Trans. Intell. Transport. Syst., vol. 23, no. 7, pp. 6282–6297, 2021.
- A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving? the kitti vision benchmark suite,” in IEEE/CVF Conf. Comput. Vis. Pattern Recog., 2012, pp. 3354–3361.
- L. Nunes, R. Marcuzzi, X. Chen, J. Behley, and C. Stachniss, “Segcontrast: 3d point cloud feature representation learning through self-supervised segment discrimination,” IEEE Robot. Autom. Lett., vol. 7, pp. 2116–2123, 2022.
- X. Yan, H. Zhang, Y. Cai, J. Guo, W. Qiu, B. Gao, K. Zhou, Y. Zhao, H. Jin, J. Gao, Z. Li, L. Jiang, W. Zhang, H. Zhang, D. Dai, and B. Liu, “Forging vision foundation models for autonomous driving: Challenges, methodologies, and opportunities,” arXiv preprint arXiv:2401.08045, 2024.
- S. D. Pendleton, H. Andersen, X. Du, X. Shen, M. Meghjani, Y. H. Eng, D. Rus, and M. H. Ang, “Perception, planning, control, and coordination for autonomous vehicles,” Machines, vol. 5, no. 1, p. 6, 2017.
- L. Kong, S. Xie, H. Hu, L. X. Ng, B. R. Cottereau, and W. T. Ooi, “Robodepth: Robust out-of-distribution depth estimation under corruptions,” in Adv. Neural Inf. Process. Syst., vol. 36, 2023.
- O. Unal, D. Dai, and L. V. Gool, “Scribble-supervised lidar semantic segmentation,” in IEEE/CVF Conf. Comput. Vis. Pattern Recog., 2022, pp. 2697–2707.
- Q. Hu, B. Yang, G. Fang, Y. Guo, A. Leonardis, N. Trigoni, and A. Markham, “Sqn: Weakly-supervised semantic segmentation of large-scale 3d point clouds with 1000x fewer labels,” in Eur. Conf. Comput. Vis., 2022, pp. 600–619.
- L. Kong, N. Quader, and V. E. Liong, “Conda: Unsupervised domain adaptation for lidar segmentation via regularized domain concatenation,” in IEEE Int. Conf. Robot. Autom., 2023, pp. 9338–9345.
- L. Kong, J. Ren, L. Pan, and Z. Liu, “Lasermix for semi-supervised lidar semantic segmentation,” in IEEE/CVF Conf. Comput. Vis. Pattern Recog., 2023, pp. 21 705–21 715.
- L. Li, H. P. H. Shum, and T. P. Breckon, “Less is more: Reducing task and model complexity for 3d point cloud semantic segmentation,” in IEEE/CVF Conf. Comput. Vis. Pattern Recog., 2023, pp. 9361–9371.
- M. Liu, Y. Zhou, C. R. Qi, B. Gong, H. Su, and D. Anguelov, “Less: Label-efficient semantic segmentation for lidar point clouds,” in IEEE/CVF Conf. Comput. Vis. Pattern Recog., 2022, pp. 70–89.
- B. Gao, Y. Pan, C. Li, S. Geng, and H. Zhao, “Are we hungry for 3d lidar data for semantic segmentation? a survey of datasets and methods,” IEEE Trans. Intell. Transport. Syst., vol. 23, no. 7, pp. 6063–6081, 2021.
- A. H. Gebrehiwot, P. Vacek, D. Hurych, K. Zimmermann, P. Pérez, and T. Svoboda, “Teachers in concordance for pseudo-labeling of 3d sequential data,” IEEE Robot. Autom. Lett., vol. 8, pp. 536–543, 2022.
- D. Berthelot, N. Carlini, I. Goodfellow, N. Papernot, A. Oliver, and C. A. Raffel, “Mixmatch: A holistic approach to semi-supervised learning,” in Adv. Neural Inf. Process. Syst., vol. 32, 2019.
- K. Sohn, D. Berthelot, N. Carlini, Z. Zhang, H. Zhang, C. A. Raffel, E. D. Cubuk, A. Kurakin, and C.-L. Li, “Fixmatch: Simplifying semi-supervised learning with consistency and confidence,” in Adv. Neural Inf. Process. Syst., vol. 33, 2020.
- D. Berthelot, N. Carlini, E. D. Cubuk, A. Kurakin, K. Sohn, H. Zhang, and C. Raffel, “Remixmatch: Semi-supervised learning with distribution alignment and augmentation anchoring,” arXiv preprint arXiv:1911.09785, 2019.
- A. Tarvainen and H. Valpola, “Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results,” in Adv. Neural Inf. Process. Syst., vol. 30, 2017.
- Y. Ouali, C. Hudelot, and M. Tami, “Semi-supervised semantic segmentation with cross-consistency training,” in IEEE/CVF Conf. Comput. Vis. Pattern Recog., 2020, pp. 12 674–12 684.
- Z. Ke, D. Qiu, K. Li, Q. Yan, and R. W. Lau, “Guided collaborative training for pixel-wise semi-supervised learning,” in Eur. Conf. Comput. Vis., 2020, pp. 429–445.
- X. Chen, Y. Yuan, G. Zeng, and J. Wang, “Semi-supervised semantic segmentation with cross pseudo supervision,” in IEEE/CVF Conf. Comput. Vis. Pattern Recog., 2021, pp. 2613–2622.
- L. Jiang, S. Shi, Z. Tian, X. Lai, S. Liu, C.-W. Fu, and J. Jia, “Guided point contrastive learning for semi-supervised point cloud semantic segmentation,” in IEEE/CVF Int. Conf. Comput. Vis., 2021, pp. 6423–6432.
- J. Park, C. Xu, Y. Zhou, M. Tomizuka, and W. Zhan, “Detmatch: Two teachers are better than one for joint 2d and 3d semi-supervised object detection,” in Eur. Conf. Comput. Vis., 2022, pp. 370–389.
- C. R. Qi, Y. Zhou, M. Najibi, P. Sun, K. Vo, B. Deng, and D. Anguelov, “Offboard 3d object detection from point cloud sequences,” in IEEE/CVF Conf. Comput. Vis. Pattern Recog., 2021, pp. 6134–6144.
- C. Liu, C. Gao, F. Liu, P. Li, D. Meng, and X. Gao, “Hierarchical supervision and shuffle data augmentation for 3d semi-supervised object detection,” in IEEE/CVF Conf. Comput. Vis. Pattern Recog., 2023, pp. 23 819–23 828.
- H. Caesar, V. Bankiti, A. H. Lang, S. Vora, V. E. Liong, Q. Xu, A. Krishnan, Y. Pan, G. Baldan, and O. Beijbom, “nuscenes: A multimodal dataset for autonomous driving,” in IEEE/CVF Conf. Comput. Vis. Pattern Recog., 2020, pp. 11 621–11 631.
- P. Sun, H. Kretzschmar, X. Dotiwalla, A. Chouard, V. Patnaik, P. Tsui, J. Guo, Y. Zhou, Y. Chai, B. Caine, V. Vasudevan, W. Han, J. Ngiam, H. Zhao, A. Timofeev, S. Ettinger, M. Krivokon, A. Gao, A. Joshi, Y. Zhang, J. Shlens, Z. Chen, and D. Anguelov, “Scalability in perception for autonomous driving: Waymo open dataset,” in IEEE/CVF Conf. Comput. Vis. Pattern Recog., 2020, pp. 2446–2454.
- J. Behley, M. Garbade, A. Milioto, J. Quenzel, S. Behnke, C. Stachniss, and J. Gall, “Semantickitti: A dataset for semantic scene understanding of lidar sequences,” in IEEE/CVF Int. Conf. Comput. Vis., 2019, pp. 9297–9307.
- D. J. Yeong, G. Velasco-Hernandez, J. Barry, and J. Walsh, “Sensor and sensor fusion technology in autonomous vehicles: A review,” Sensors, vol. 21, no. 6, p. 2140, 2021.
- K. Chitta, A. Prakash, B. Jaeger, Z. Yu, K. Renz, and A. Geiger, “Transfuser: Imitation with transformer-based sensor fusion for autonomous driving,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 11, pp. 12 878–12 895, 2023.
- L. Kong, Y. Liu, X. Li, R. Chen, W. Zhang, J. Ren, L. Pan, K. Chen, and Z. Liu, “Robo3d: Towards robust and reliable 3d perception against corruptions,” in IEEE/CVF Int. Conf. Comput. Vis., 2023, pp. 19 994–20 006.
- S. Xie, L. Kong, W. Zhang, J. Ren, L. Pan, K. Chen, and Z. Liu, “Robobev: Towards robust bird’s eye view perception under corruptions,” arXiv preprint arXiv:2304.06719, 2023.
- X. Zhu, H. Zhou, T. Wang, F. Hong, Y. Ma, W. Li, H. Li, and D. Lin, “Cylindrical and asymmetrical 3d convolution networks for lidar segmentation,” in IEEE/CVF Conf. Comput. Vis. Pattern Recog., 2021, pp. 9939–9948.
- Q. Chen, S. Vora, and O. Beijbom, “Polarstream: Streaming lidar object detection and segmentation with polar pillars,” in Adv. Neural Inf. Process. Syst., vol. 34, 2021, pp. 26 871–26 883.
- T. Cortinhal, G. Tzelepis, and E. E. Aksoy, “Salsanext: Fast, uncertainty-aware semantic segmentation of lidar point clouds for autonomous driving,” arXiv preprint arXiv:2003.03653, 2020.
- Y. Zhao, L. Bai, and X. Huang, “Fidnet: Lidar point cloud semantic segmentation with fully interpolation decoding,” in IEEE/RSJ Int. Conf. Intell. Robots Syst., 2021, pp. 4453–4458.
- Y. Zhang, Z. Zhou, P. David, X. Yue, Z. Xi, B. Gong, and H. Foroosh, “Polarnet: An improved grid representation for online lidar point clouds semantic segmentation,” in IEEE/CVF Conf. Comput. Vis. Pattern Recog., 2020, pp. 9601–9610.
- W. K. Fong, R. Mohan, J. V. Hurtado, L. Zhou, H. Caesar, O. Beijbom, and A. Valada, “Panoptic nuscenes: A large-scale benchmark for lidar panoptic segmentation and tracking,” IEEE Robot. Autom. Lett., vol. 7, pp. 3795–3802, 2022.
- A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger, and I. Sutskever, “Learning transferable visual models from natural language supervision,” in Int. Conf. Mach. Learn., 2021, pp. 8748–8763.
- A. Milioto, I. Vizzo, J. Behley, and C. Stachniss, “Rangenet++: Fast and accurate lidar semantic segmentation,” in IEEE/RSJ Int. Conf. Intell. Robots Syst., 2019, pp. 4213–4220.
- H. Tang, Z. Liu, S. Zhao, Y. Lin, J. Lin, H. Wang, and S. Han, “Searching efficient 3d architectures with sparse point-voxel convolution,” in Eur. Conf. Comput. Vis., 2020, pp. 685–702.
- K. Muhammad, T. Hussain, H. Ullah, J. D. Ser, M. Rezaei, N. Kumar, M. Hijji, P. Bellavista, and V. H. C. de Albuquerque, “Vision-based semantic segmentation in scene understanding for autonomous driving: Recent achievements, challenges, and outlooks,” IEEE Trans. Intell. Transport. Syst., vol. 23, no. 12, pp. 22 694–22 715, 2022.
- L. Kong, X. Xu, J. Cen, W. Zhang, L. Pan, K. Chen, and Z. Liu, “Calib3d: Calibrating model preferences for reliable 3d scene understanding,” arXiv preprint arXiv:2403.17010, 2024.
- Y. Li, L. Kong, H. Hu, X. Xu, and X. Huang, “Optimizing lidar placements for robust driving perception in adverse conditions,” arXiv preprint arXiv:2403.17009, 2024.
- P. Jiang, P. Osteen, M. Wigness, and S. Saripallig, “Rellis-3d dataset: Data, benchmarks and analysis,” in IEEE Int. Conf. Robot. Autom., 2021, pp. 1110–1116.
- M. Naseer, S. Khan, and F. Porikli, “Indoor scene understanding in 2.5/3d for autonomous agents: A survey,” IEEE Access, vol. 7, pp. 1859–1887, 2018.
- C. Xu, B. Wu, Z. Wang, W. Zhan, P. Vajda, K. Keutzer, and M. Tomizuka, “Squeezesegv3: Spatially-adaptive convolution for efficient point-cloud segmentation,” in Eur. Conf. Comput. Vis., 2020, pp. 1–19.
- L. Kong, Y. Liu, R. Chen, Y. Ma, X. Zhu, Y. Li, Y. Hou, Y. Qiao, and Z. Liu, “Rethinking range view representation for lidar segmentation,” in IEEE/CVF Int. Conf. Comput. Vis., 2023, pp. 228–240.
- A. Ando, S. Gidaris, A. Bursuc, G. Puy, A. Boulch, and R. Marlet, “Rangevit: Towards vision transformers for 3d semantic segmentation in autonomous driving,” in IEEE/CVF Conf. Comput. Vis. Pattern Recog., 2023, pp. 5240–5250.
- X. Xu, L. Kong, H. Shuai, and Q. Liu, “Frnet: Frustum-range networks for scalable lidar segmentation,” arXiv preprint arXiv:2312.04484, 2023.
- Z. Zhou, Y. Zhang, and H. Foroosh, “Panoptic-polarnet: Proposal-free lidar point cloud panoptic segmentation,” in IEEE/CVF Conf. Comput. Vis. Pattern Recog., 2021, pp. 13 194–13 203.
- C. Choy, J. Gwak, and S. Savarese, “4d spatio-temporal convnets: Minkowski convolutional neural networks,” in IEEE/CVF Conf. Comput. Vis. Pattern Recog., 2019, pp. 3075–3084.
- F. Hong, L. Kong, H. Zhou, X. Zhu, H. Li, and Z. Liu, “Unified 3d and 4d panoptic segmentation via dynamic shifting networks,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 46, no. 5, pp. 3480–3495, 2024.
- V. E. Liong, T. N. T. Nguyen, S. Widjaja, D. Sharma, and Z. J. Chong, “Amvnet: Assertion-based multi-view fusion network for lidar semantic segmentation,” arXiv preprint arXiv:2012.04934, 2020.
- J. Xu, R. Zhang, J. Dou, Y. Zhu, J. Sun, and S. Pu, “Rpvnet: A deep and efficient range-point-voxel fusion network for lidar point cloud segmentation,” in IEEE/CVF Int. Conf. Comput. Vis., 2021, pp. 16 024–16 033.
- Y. Liu, R. Chen, X. Li, L. Kong, Y. Yang, Z. Xia, Y. Bai, X. Zhu, Y. Ma, Y. Li, Y. Qiao, and Y. Hou, “Uniseg: A unified multi-modal lidar segmentation network and the openpcseg codebase,” in IEEE/CVF Int. Conf. Comput. Vis., 2023, pp. 21 662–21 673.
- Y. Zhang, Y. Qu, Y. Xie, Z. Li, S. Zheng, and C. Li, “Perturbed self-distillation: Weakly supervised large-scale point cloud semantic segmentation,” in IEEE/CVF Int. Conf. Comput. Vis., 2021, pp. 15 520–15 528.
- Y. Liu, Q. Hu, Y. Lei, K. Xu, J. Li, and Y. Guo, “Box2seg: Learning semantics of 3d point clouds with box-level supervision,” arXiv preprint arXiv:2201.02963, 2022.
- J. Mei, B. Gao, D. Xu, W. Yao, X. Zhao, and H. Zhao, “Semantic segmentation of 3d lidar data in dynamic scene using semi-supervised learning,” IEEE Trans. Intell. Transport. Syst., vol. 21, no. 6, pp. 2496–2509, 2019.
- S. Laine and T. Aila, “Temporal ensembling for semi-supervised learning,” in Int. Conf. Learn. Represent., 2017.
- G. French, T. Aila, S. Laine, M. Mackiewicz, and G. Finlayson, “Semi-supervised semantic segmentation needs strong, high-dimensional perturbations,” in Brit. Mach. Vis. Conf., 2020.
- Y. Zou, Z. Zhang, H. Zhang, C.-L. Li, X. Bian, J.-B. Huang, and T. Pfister, “Pseudoseg: Designing pseudo labels for semantic segmentation,” in Int. Conf. Learn. Represent., 2020.
- Y. Liu, Y. Tian, Y. Chen, F. Liu, V. Belagiannis, and G. Carneiro, “Perturbed and strict mean teachers for semi-supervised semantic segmentation,” in IEEE/CVF Conf. Comput. Vis. Pattern Recog., 2021, pp. 4258–4267.
- J. Yuan, Y. Liu, C. Shen, Z. Wang, and H. Li, “A simple baseline for semi-supervised semantic segmentation with strong data augmentation,” in IEEE/CVF Int. Conf. Comput. Vis., 2021, pp. 8229–8238.
- V. Olsson, W. Tranheden, J. Pinto, and L. Svensson, “Classmix: Segmentation-based data augmentation for semi-supervised learning,” in IEEE/CVF Winter Conf. Appl. Comput. Vis., 2021, pp. 1369–1378.
- W. Luo and M. Yang, “Semi-supervised semantic segmentation via strong-weak dual-branch network,” in Eur. Conf. Comput. Vis., 2020, pp. 784–800.
- Y. Zou, Z. Yu, B. V. K. V. Kumar, and J. Wang, “Unsupervised domain adaptation for semantic segmentation via class-balanced self-training,” in Eur. Conf. Comput. Vis., 2018, pp. 289–305.
- L. Yang, W. Zhuo, L. Qi, Y. Shi, and Y. Gao, “St++: Make self-training work better for semi-supervised semantic segmentation,” in IEEE/CVF Conf. Comput. Vis. Pattern Recog., 2022, pp. 4268–4277.
- A. P. S. Kohli, V. Sitzmann, and G. Wetzstein, “Semantic implicit neural scene representations with semi-supervised training,” in IEEE Int. Conf. 3D Vision, 2020, pp. 423–433.
- C.-Y. Sun, Y.-Q. Yang, H.-X. Guo, P.-S. Wang, X. Tong, Y. Liu, and H.-Y. Shum, “Semi-supervised 3d shape segmentation with multilevel consistency and part substitution,” Computational Visual Media, vol. 9, no. 2, pp. 229–247, 2023.
- S. Deng, Q. Dong, B. Liu, and Z. Hu, “Superpoint-guided semi-supervised semantic segmentation of 3d point clouds,” arXiv preprint arXiv:2107.03601, 2021.
- M. Cheng, L. Hui, J. Xie, and J. Yang, “Sspc-net: Semi-supervised semantic 3d point cloud segmentation network,” in AAAI Conf. Artifi. Intell., 2021, pp. 1140–1147.
- J. Hou, B. Graham, M. Nießner, and S. Xie, “Exploring data-efficient 3d scene understanding with contrastive scene contexts,” in IEEE/CVF Conf. Comput. Vis. Pattern Recog., 2021, pp. 15 587–15 597.
- L. Kong, Y. Liu, L. X. Ng, B. R. Cottereau, and W. T. Ooi, “Openess: Event-based semantic scene understanding with open vocabularies,” in IEEE/CVF Conf. Comput. Vis. Pattern Recog., 2024.
- J. Wang, H. Gang, S. Ancha, Y.-T. Chen, and D. Held, “Semi-supervised 3d object detection via temporal graph neural networks,” in IEEE Int. Conf. 3D Vision, 2021, pp. 413–422.
- C. Sautier, G. Puy, S. Gidaris, A. Boulch, A. Bursuc, and R. Marlet, “Image-to-lidar self-supervised distillation for autonomous driving data,” in IEEE/CVF Conf. Comput. Vis. Pattern Recog., 2022, pp. 9891–9901.
- Y. Liu, L. Kong, J. Cen, R. Chen, W. Zhang, L. Pan, K. Chen, and Z. Liu, “Segment any point cloud sequences by distilling vision foundation models,” in Adv. Neural Inf. Process. Syst., vol. 36, 2023.
- G. Puy, S. Gidaris, A. Boulch, O. Siméoni, C. Sautier, P. Pérez, A. Bursuc, and R. Marlet, “Three pillars improving vision foundation model distillation for lidar,” in IEEE/CVF Conf. Comput. Vis. Pattern Recog., 2024.
- M. Jaritz, T.-H. Vu, R. de Charette, E. Wirbel, and P. Pérez, “xmuda: Cross-modal unsupervised domain adaptation for 3d semantic segmentation,” in IEEE/CVF Conf. Comput. Vis. Pattern Recog., 2020, pp. 12 605–12 614.
- J. Xu, W. Yang, L. Kong, Y. Liu, R. Zhang, Q. Zhou, and B. Fei, “Visual foundation models boost cross-modal unsupervised domain adaptation for 3d semantic segmentation,” arXiv preprint arXiv:2403.10001, 2024.
- M. Jaritz, T.-H. Vu, R. de Charette, E. Wirbel, and P. Pérez, “Cross-modal learning for domain adaptation in 3d semantic segmentation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 2, pp. 1533–1544, 2023.
- R. Chen, Y. Liu, L. Kong, X. Zhu, Y. Ma, Y. Li, Y. Hou, Y. Qiao, and W. Wang, “Clip2scene: Towards label-efficient 3d scene understanding by clip,” in IEEE/CVF Conf. Comput. Vis. Pattern Recog., 2023, pp. 7020–7030.
- S. Peng, K. Genova, C. Jiang, A. Tagliasacchi, M. Pollefeys, and T. Funkhouser, “Openscene: 3d scene understanding with open vocabularies,” in IEEE/CVF Conf. Comput. Vis. Pattern Recog., 2023, pp. 815–824.
- R. Chen, Y. Liu, L. Kong, N. Chen, X. Zhu, Y. Ma, T. Liu, and W. Wang., “Towards label-free scene understanding by vision foundation models,” in Adv. Neural Inf. Process. Syst., vol. 36, 2023.
- Y. Liu, L. Kong, X. Wu, R. Chen, X. Li, L. Pan, Z. Liu, and Y. Ma, “Multi-space alignments towards universal lidar segmentation,” in IEEE/CVF Conf. Comput. Vis. Pattern Recog., 2024.
- Y. Grandvalet and Y. Bengio, “Semi-supervised learning by entropy minimization,” in Adv. Neural Inf. Process. Syst., vol. 17, 2004.
- H. Zhang, M. Cisse, Y. N. Dauphin, and D. Lopez-Paz, “mixup: Beyond empirical risk minimization,” in Int. Conf. Learn. Represent., 2018.
- S. Yun, D. Han, S. J. Oh, S. Chun, J. Choe, and Y. Yoo, “Cutmix: Regularization strategy to train strong classifiers with localizable features,” in IEEE/CVF Int. Conf. Comput. Vis., 2019, pp. 6023–6032.
- Q. Hu, B. Yang, L. Xie, S. Rosa, Y. Guo, N. T. Zhihua Wang, and A. Markham, “Randla-net: Efficient semantic segmentation of large-scale point clouds,” in IEEE/CVF Conf. Comput. Vis. Pattern Recog., 2020, pp. 11 108–11 117.
- D.-H. Lee, “Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks,” in Int. Conf. Mach. Learn. Worksh., vol. 3, 2013.
- T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft coco: Common objects in context,” in Eur. Conf. Comput. Vis., 2014, pp. 740–755.
- H. Zhang, F. Li, X. Zou, S. Liu, C. Li, J. Gao, J. Yang, and L. Zhang, “Objects365: A large-scale, high-quality dataset for object detection,” in IEEE/CVF Int. Conf. Comput. Vis., 2019, pp. 8430–8439.
- B. Zhou, H. Zhao, X. Puig, S. Fidler, A. Barriuso, and A. Torralba, “Scene parsing through ade20k dataset,” in IEEE/CVF Conf. Comput. Vis. Pattern Recog., 2017, pp. 633–641.
- M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele, “The cityscapes dataset for semantic urban scene understanding,” in IEEE/CVF Conf. Comput. Vis. Pattern Recog., 2016, pp. 3213–3223.
- A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y. Lo, P. Dollár, and R. Girshick, “Segment anything,” in IEEE/CVF Int. Conf. Comput. Vis., 2023, pp. 4015–4026.
- H. Zhang, F. Li, X. Zou, S. Liu, C. Li, J. Gao, J. Yang, and L. Zhang, “A simple framework for open-vocabulary segmentation and detection,” in IEEE/CVF Int. Conf. Comput. Vis., 2023, pp. 1020–1031.
- X. Zou, Z.-Y. Dou, J. Yang, Z. Gan, L. Li, C. Li, X. Dai, H. Behl, J. Wang, L. Yuan, N. Peng, L. Wang, Y. J. Lee, and J. Gao, “Generalized decoding for pixel, image, and language,” in IEEE/CVF Conf. Comput. Vis. Pattern Recog., 2023, pp. 15 116–15 127.
- X. Zou, J. Yang, H. Zhang, F. Li, L. Li, J. Gao, and Y. J. Lee, “Segment everything everywhere all at once,” in Adv. Neural Inf. Process. Syst., vol. 36, 2023.
- T. Yin, X. Zhou, and P. Krahenbuhl, “Center-based 3d object detection and tracking,” in IEEE/CVF Conf. Comput. Vis. Pattern Recog., 2021, pp. 11 784–11 793.
- A. Nekrasov, J. Schult, O. Litany, B. Leibe, and F. Engelmann, “Mix3d: Out-of-context data augmentation for 3d scenes,” in IEEE Int. Conf. 3D Vision, 2021, pp. 116–125.
- A. Xiao, J. Huang, D. Guan, K. Cui, S. Lu, and L. Shao, “Polarmix: A general data augmentation technique for lidar point clouds,” in Adv. Neural Inf. Process. Syst., vol. 35, 2022, pp. 11 035–11 048.
- S. Xie, J. Gu, D. Guo, C. R. Qi, L. Guibas, and O. Litany, “Pointcontrast: Unsupervised pre-training for 3d point cloud understanding,” in Eur. Conf. Comput. Vis., 2020, pp. 574–591.
- Z. Zhang, R. Girdhar, A. Joulin, and I. Misra, “Self-supervised pretraining of 3d features on any point-cloud,” in IEEE/CVF Int. Conf. Comput. Vis., 2021, pp. 10 252–10 263.
- Y.-C. Liu, Y.-K. Huang, H.-Y. Chiang, H.-T. Su, Z.-Y. Liu, C.-T. Chen, C.-Y. Tseng, and W. H. Hsu, “Learning from 2d: Contrastive pixel-to-point knowledge transfer for 3d pretraining,” arXiv preprint arXiv:2104.0468, 2021.
- A. Mahmoud, J. S. Hu, T. Kuai, A. Harakeh, L. Paull, and S. L. Waslander, “Self-supervised image-to-point distillation via semantically tolerant contrastive loss,” in IEEE/CVF Conf. Comput. Vis. Pattern Recog., 2023, pp. 7102–7110.
- M. Contributors, “MMDetection3D: OpenMMLab next-generation platform for general 3D object detection,” https://github.com/open-mmlab/mmdetection3d, 2020.
- I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” in Int. Conf. Learn. Represent., 2018.
- L. N. Smith and N. Topin, “Super-convergence: Very fast training of neural networks using large learning rates,” arXiv preprint arXiv:1708.07120, 2017.
- A. Oliver, A. Odena, C. A. Raffel, E. D. Cubuk, and I. Goodfellow, “Realistic evaluation of deep semi-supervised learning algorithms,” in Adv. Neural Inf. Process. Syst., vol. 31, 2018.
- T. DeVries and G. W. Taylor, “Improved regularization of convolutional neural networks with cutout,” arXiv preprint arXiv:1708.04552, 2017.
- Lingdong Kong (49 papers)
- Xiang Xu (83 papers)
- Jiawei Ren (33 papers)
- Wenwei Zhang (77 papers)
- Liang Pan (93 papers)
- Kai Chen (512 papers)
- Wei Tsang Ooi (26 papers)
- Ziwei Liu (368 papers)