Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Multi-Modal Data-Efficient 3D Scene Understanding for Autonomous Driving (2405.05258v1)

Published 8 May 2024 in cs.CV, cs.LG, and cs.RO

Abstract: Efficient data utilization is crucial for advancing 3D scene understanding in autonomous driving, where reliance on heavily human-annotated LiDAR point clouds challenges fully supervised methods. Addressing this, our study extends into semi-supervised learning for LiDAR semantic segmentation, leveraging the intrinsic spatial priors of driving scenes and multi-sensor complements to augment the efficacy of unlabeled datasets. We introduce LaserMix++, an evolved framework that integrates laser beam manipulations from disparate LiDAR scans and incorporates LiDAR-camera correspondences to further assist data-efficient learning. Our framework is tailored to enhance 3D scene consistency regularization by incorporating multi-modality, including 1) multi-modal LaserMix operation for fine-grained cross-sensor interactions; 2) camera-to-LiDAR feature distillation that enhances LiDAR feature learning; and 3) language-driven knowledge guidance generating auxiliary supervisions using open-vocabulary models. The versatility of LaserMix++ enables applications across LiDAR representations, establishing it as a universally applicable solution. Our framework is rigorously validated through theoretical analysis and extensive experiments on popular driving perception datasets. Results demonstrate that LaserMix++ markedly outperforms fully supervised alternatives, achieving comparable accuracy with five times fewer annotations and significantly improving the supervised-only baselines. This substantial advancement underscores the potential of semi-supervised approaches in reducing the reliance on extensive labeled data in LiDAR-based 3D scene understanding systems.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (110)
  1. R. Roriz, J. Cabral, and T. Gomes, “Automotive lidar technology: A survey,” IEEE Trans. Intell. Transport. Syst., vol. 23, no. 7, pp. 6282–6297, 2021.
  2. A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving? the kitti vision benchmark suite,” in IEEE/CVF Conf. Comput. Vis. Pattern Recog., 2012, pp. 3354–3361.
  3. L. Nunes, R. Marcuzzi, X. Chen, J. Behley, and C. Stachniss, “Segcontrast: 3d point cloud feature representation learning through self-supervised segment discrimination,” IEEE Robot. Autom. Lett., vol. 7, pp. 2116–2123, 2022.
  4. X. Yan, H. Zhang, Y. Cai, J. Guo, W. Qiu, B. Gao, K. Zhou, Y. Zhao, H. Jin, J. Gao, Z. Li, L. Jiang, W. Zhang, H. Zhang, D. Dai, and B. Liu, “Forging vision foundation models for autonomous driving: Challenges, methodologies, and opportunities,” arXiv preprint arXiv:2401.08045, 2024.
  5. S. D. Pendleton, H. Andersen, X. Du, X. Shen, M. Meghjani, Y. H. Eng, D. Rus, and M. H. Ang, “Perception, planning, control, and coordination for autonomous vehicles,” Machines, vol. 5, no. 1, p. 6, 2017.
  6. L. Kong, S. Xie, H. Hu, L. X. Ng, B. R. Cottereau, and W. T. Ooi, “Robodepth: Robust out-of-distribution depth estimation under corruptions,” in Adv. Neural Inf. Process. Syst., vol. 36, 2023.
  7. O. Unal, D. Dai, and L. V. Gool, “Scribble-supervised lidar semantic segmentation,” in IEEE/CVF Conf. Comput. Vis. Pattern Recog., 2022, pp. 2697–2707.
  8. Q. Hu, B. Yang, G. Fang, Y. Guo, A. Leonardis, N. Trigoni, and A. Markham, “Sqn: Weakly-supervised semantic segmentation of large-scale 3d point clouds with 1000x fewer labels,” in Eur. Conf. Comput. Vis., 2022, pp. 600–619.
  9. L. Kong, N. Quader, and V. E. Liong, “Conda: Unsupervised domain adaptation for lidar segmentation via regularized domain concatenation,” in IEEE Int. Conf. Robot. Autom., 2023, pp. 9338–9345.
  10. L. Kong, J. Ren, L. Pan, and Z. Liu, “Lasermix for semi-supervised lidar semantic segmentation,” in IEEE/CVF Conf. Comput. Vis. Pattern Recog., 2023, pp. 21 705–21 715.
  11. L. Li, H. P. H. Shum, and T. P. Breckon, “Less is more: Reducing task and model complexity for 3d point cloud semantic segmentation,” in IEEE/CVF Conf. Comput. Vis. Pattern Recog., 2023, pp. 9361–9371.
  12. M. Liu, Y. Zhou, C. R. Qi, B. Gong, H. Su, and D. Anguelov, “Less: Label-efficient semantic segmentation for lidar point clouds,” in IEEE/CVF Conf. Comput. Vis. Pattern Recog., 2022, pp. 70–89.
  13. B. Gao, Y. Pan, C. Li, S. Geng, and H. Zhao, “Are we hungry for 3d lidar data for semantic segmentation? a survey of datasets and methods,” IEEE Trans. Intell. Transport. Syst., vol. 23, no. 7, pp. 6063–6081, 2021.
  14. A. H. Gebrehiwot, P. Vacek, D. Hurych, K. Zimmermann, P. Pérez, and T. Svoboda, “Teachers in concordance for pseudo-labeling of 3d sequential data,” IEEE Robot. Autom. Lett., vol. 8, pp. 536–543, 2022.
  15. D. Berthelot, N. Carlini, I. Goodfellow, N. Papernot, A. Oliver, and C. A. Raffel, “Mixmatch: A holistic approach to semi-supervised learning,” in Adv. Neural Inf. Process. Syst., vol. 32, 2019.
  16. K. Sohn, D. Berthelot, N. Carlini, Z. Zhang, H. Zhang, C. A. Raffel, E. D. Cubuk, A. Kurakin, and C.-L. Li, “Fixmatch: Simplifying semi-supervised learning with consistency and confidence,” in Adv. Neural Inf. Process. Syst., vol. 33, 2020.
  17. D. Berthelot, N. Carlini, E. D. Cubuk, A. Kurakin, K. Sohn, H. Zhang, and C. Raffel, “Remixmatch: Semi-supervised learning with distribution alignment and augmentation anchoring,” arXiv preprint arXiv:1911.09785, 2019.
  18. A. Tarvainen and H. Valpola, “Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results,” in Adv. Neural Inf. Process. Syst., vol. 30, 2017.
  19. Y. Ouali, C. Hudelot, and M. Tami, “Semi-supervised semantic segmentation with cross-consistency training,” in IEEE/CVF Conf. Comput. Vis. Pattern Recog., 2020, pp. 12 674–12 684.
  20. Z. Ke, D. Qiu, K. Li, Q. Yan, and R. W. Lau, “Guided collaborative training for pixel-wise semi-supervised learning,” in Eur. Conf. Comput. Vis., 2020, pp. 429–445.
  21. X. Chen, Y. Yuan, G. Zeng, and J. Wang, “Semi-supervised semantic segmentation with cross pseudo supervision,” in IEEE/CVF Conf. Comput. Vis. Pattern Recog., 2021, pp. 2613–2622.
  22. L. Jiang, S. Shi, Z. Tian, X. Lai, S. Liu, C.-W. Fu, and J. Jia, “Guided point contrastive learning for semi-supervised point cloud semantic segmentation,” in IEEE/CVF Int. Conf. Comput. Vis., 2021, pp. 6423–6432.
  23. J. Park, C. Xu, Y. Zhou, M. Tomizuka, and W. Zhan, “Detmatch: Two teachers are better than one for joint 2d and 3d semi-supervised object detection,” in Eur. Conf. Comput. Vis., 2022, pp. 370–389.
  24. C. R. Qi, Y. Zhou, M. Najibi, P. Sun, K. Vo, B. Deng, and D. Anguelov, “Offboard 3d object detection from point cloud sequences,” in IEEE/CVF Conf. Comput. Vis. Pattern Recog., 2021, pp. 6134–6144.
  25. C. Liu, C. Gao, F. Liu, P. Li, D. Meng, and X. Gao, “Hierarchical supervision and shuffle data augmentation for 3d semi-supervised object detection,” in IEEE/CVF Conf. Comput. Vis. Pattern Recog., 2023, pp. 23 819–23 828.
  26. H. Caesar, V. Bankiti, A. H. Lang, S. Vora, V. E. Liong, Q. Xu, A. Krishnan, Y. Pan, G. Baldan, and O. Beijbom, “nuscenes: A multimodal dataset for autonomous driving,” in IEEE/CVF Conf. Comput. Vis. Pattern Recog., 2020, pp. 11 621–11 631.
  27. P. Sun, H. Kretzschmar, X. Dotiwalla, A. Chouard, V. Patnaik, P. Tsui, J. Guo, Y. Zhou, Y. Chai, B. Caine, V. Vasudevan, W. Han, J. Ngiam, H. Zhao, A. Timofeev, S. Ettinger, M. Krivokon, A. Gao, A. Joshi, Y. Zhang, J. Shlens, Z. Chen, and D. Anguelov, “Scalability in perception for autonomous driving: Waymo open dataset,” in IEEE/CVF Conf. Comput. Vis. Pattern Recog., 2020, pp. 2446–2454.
  28. J. Behley, M. Garbade, A. Milioto, J. Quenzel, S. Behnke, C. Stachniss, and J. Gall, “Semantickitti: A dataset for semantic scene understanding of lidar sequences,” in IEEE/CVF Int. Conf. Comput. Vis., 2019, pp. 9297–9307.
  29. D. J. Yeong, G. Velasco-Hernandez, J. Barry, and J. Walsh, “Sensor and sensor fusion technology in autonomous vehicles: A review,” Sensors, vol. 21, no. 6, p. 2140, 2021.
  30. K. Chitta, A. Prakash, B. Jaeger, Z. Yu, K. Renz, and A. Geiger, “Transfuser: Imitation with transformer-based sensor fusion for autonomous driving,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 11, pp. 12 878–12 895, 2023.
  31. L. Kong, Y. Liu, X. Li, R. Chen, W. Zhang, J. Ren, L. Pan, K. Chen, and Z. Liu, “Robo3d: Towards robust and reliable 3d perception against corruptions,” in IEEE/CVF Int. Conf. Comput. Vis., 2023, pp. 19 994–20 006.
  32. S. Xie, L. Kong, W. Zhang, J. Ren, L. Pan, K. Chen, and Z. Liu, “Robobev: Towards robust bird’s eye view perception under corruptions,” arXiv preprint arXiv:2304.06719, 2023.
  33. X. Zhu, H. Zhou, T. Wang, F. Hong, Y. Ma, W. Li, H. Li, and D. Lin, “Cylindrical and asymmetrical 3d convolution networks for lidar segmentation,” in IEEE/CVF Conf. Comput. Vis. Pattern Recog., 2021, pp. 9939–9948.
  34. Q. Chen, S. Vora, and O. Beijbom, “Polarstream: Streaming lidar object detection and segmentation with polar pillars,” in Adv. Neural Inf. Process. Syst., vol. 34, 2021, pp. 26 871–26 883.
  35. T. Cortinhal, G. Tzelepis, and E. E. Aksoy, “Salsanext: Fast, uncertainty-aware semantic segmentation of lidar point clouds for autonomous driving,” arXiv preprint arXiv:2003.03653, 2020.
  36. Y. Zhao, L. Bai, and X. Huang, “Fidnet: Lidar point cloud semantic segmentation with fully interpolation decoding,” in IEEE/RSJ Int. Conf. Intell. Robots Syst., 2021, pp. 4453–4458.
  37. Y. Zhang, Z. Zhou, P. David, X. Yue, Z. Xi, B. Gong, and H. Foroosh, “Polarnet: An improved grid representation for online lidar point clouds semantic segmentation,” in IEEE/CVF Conf. Comput. Vis. Pattern Recog., 2020, pp. 9601–9610.
  38. W. K. Fong, R. Mohan, J. V. Hurtado, L. Zhou, H. Caesar, O. Beijbom, and A. Valada, “Panoptic nuscenes: A large-scale benchmark for lidar panoptic segmentation and tracking,” IEEE Robot. Autom. Lett., vol. 7, pp. 3795–3802, 2022.
  39. A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger, and I. Sutskever, “Learning transferable visual models from natural language supervision,” in Int. Conf. Mach. Learn., 2021, pp. 8748–8763.
  40. A. Milioto, I. Vizzo, J. Behley, and C. Stachniss, “Rangenet++: Fast and accurate lidar semantic segmentation,” in IEEE/RSJ Int. Conf. Intell. Robots Syst., 2019, pp. 4213–4220.
  41. H. Tang, Z. Liu, S. Zhao, Y. Lin, J. Lin, H. Wang, and S. Han, “Searching efficient 3d architectures with sparse point-voxel convolution,” in Eur. Conf. Comput. Vis., 2020, pp. 685–702.
  42. K. Muhammad, T. Hussain, H. Ullah, J. D. Ser, M. Rezaei, N. Kumar, M. Hijji, P. Bellavista, and V. H. C. de Albuquerque, “Vision-based semantic segmentation in scene understanding for autonomous driving: Recent achievements, challenges, and outlooks,” IEEE Trans. Intell. Transport. Syst., vol. 23, no. 12, pp. 22 694–22 715, 2022.
  43. L. Kong, X. Xu, J. Cen, W. Zhang, L. Pan, K. Chen, and Z. Liu, “Calib3d: Calibrating model preferences for reliable 3d scene understanding,” arXiv preprint arXiv:2403.17010, 2024.
  44. Y. Li, L. Kong, H. Hu, X. Xu, and X. Huang, “Optimizing lidar placements for robust driving perception in adverse conditions,” arXiv preprint arXiv:2403.17009, 2024.
  45. P. Jiang, P. Osteen, M. Wigness, and S. Saripallig, “Rellis-3d dataset: Data, benchmarks and analysis,” in IEEE Int. Conf. Robot. Autom., 2021, pp. 1110–1116.
  46. M. Naseer, S. Khan, and F. Porikli, “Indoor scene understanding in 2.5/3d for autonomous agents: A survey,” IEEE Access, vol. 7, pp. 1859–1887, 2018.
  47. C. Xu, B. Wu, Z. Wang, W. Zhan, P. Vajda, K. Keutzer, and M. Tomizuka, “Squeezesegv3: Spatially-adaptive convolution for efficient point-cloud segmentation,” in Eur. Conf. Comput. Vis., 2020, pp. 1–19.
  48. L. Kong, Y. Liu, R. Chen, Y. Ma, X. Zhu, Y. Li, Y. Hou, Y. Qiao, and Z. Liu, “Rethinking range view representation for lidar segmentation,” in IEEE/CVF Int. Conf. Comput. Vis., 2023, pp. 228–240.
  49. A. Ando, S. Gidaris, A. Bursuc, G. Puy, A. Boulch, and R. Marlet, “Rangevit: Towards vision transformers for 3d semantic segmentation in autonomous driving,” in IEEE/CVF Conf. Comput. Vis. Pattern Recog., 2023, pp. 5240–5250.
  50. X. Xu, L. Kong, H. Shuai, and Q. Liu, “Frnet: Frustum-range networks for scalable lidar segmentation,” arXiv preprint arXiv:2312.04484, 2023.
  51. Z. Zhou, Y. Zhang, and H. Foroosh, “Panoptic-polarnet: Proposal-free lidar point cloud panoptic segmentation,” in IEEE/CVF Conf. Comput. Vis. Pattern Recog., 2021, pp. 13 194–13 203.
  52. C. Choy, J. Gwak, and S. Savarese, “4d spatio-temporal convnets: Minkowski convolutional neural networks,” in IEEE/CVF Conf. Comput. Vis. Pattern Recog., 2019, pp. 3075–3084.
  53. F. Hong, L. Kong, H. Zhou, X. Zhu, H. Li, and Z. Liu, “Unified 3d and 4d panoptic segmentation via dynamic shifting networks,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 46, no. 5, pp. 3480–3495, 2024.
  54. V. E. Liong, T. N. T. Nguyen, S. Widjaja, D. Sharma, and Z. J. Chong, “Amvnet: Assertion-based multi-view fusion network for lidar semantic segmentation,” arXiv preprint arXiv:2012.04934, 2020.
  55. J. Xu, R. Zhang, J. Dou, Y. Zhu, J. Sun, and S. Pu, “Rpvnet: A deep and efficient range-point-voxel fusion network for lidar point cloud segmentation,” in IEEE/CVF Int. Conf. Comput. Vis., 2021, pp. 16 024–16 033.
  56. Y. Liu, R. Chen, X. Li, L. Kong, Y. Yang, Z. Xia, Y. Bai, X. Zhu, Y. Ma, Y. Li, Y. Qiao, and Y. Hou, “Uniseg: A unified multi-modal lidar segmentation network and the openpcseg codebase,” in IEEE/CVF Int. Conf. Comput. Vis., 2023, pp. 21 662–21 673.
  57. Y. Zhang, Y. Qu, Y. Xie, Z. Li, S. Zheng, and C. Li, “Perturbed self-distillation: Weakly supervised large-scale point cloud semantic segmentation,” in IEEE/CVF Int. Conf. Comput. Vis., 2021, pp. 15 520–15 528.
  58. Y. Liu, Q. Hu, Y. Lei, K. Xu, J. Li, and Y. Guo, “Box2seg: Learning semantics of 3d point clouds with box-level supervision,” arXiv preprint arXiv:2201.02963, 2022.
  59. J. Mei, B. Gao, D. Xu, W. Yao, X. Zhao, and H. Zhao, “Semantic segmentation of 3d lidar data in dynamic scene using semi-supervised learning,” IEEE Trans. Intell. Transport. Syst., vol. 21, no. 6, pp. 2496–2509, 2019.
  60. S. Laine and T. Aila, “Temporal ensembling for semi-supervised learning,” in Int. Conf. Learn. Represent., 2017.
  61. G. French, T. Aila, S. Laine, M. Mackiewicz, and G. Finlayson, “Semi-supervised semantic segmentation needs strong, high-dimensional perturbations,” in Brit. Mach. Vis. Conf., 2020.
  62. Y. Zou, Z. Zhang, H. Zhang, C.-L. Li, X. Bian, J.-B. Huang, and T. Pfister, “Pseudoseg: Designing pseudo labels for semantic segmentation,” in Int. Conf. Learn. Represent., 2020.
  63. Y. Liu, Y. Tian, Y. Chen, F. Liu, V. Belagiannis, and G. Carneiro, “Perturbed and strict mean teachers for semi-supervised semantic segmentation,” in IEEE/CVF Conf. Comput. Vis. Pattern Recog., 2021, pp. 4258–4267.
  64. J. Yuan, Y. Liu, C. Shen, Z. Wang, and H. Li, “A simple baseline for semi-supervised semantic segmentation with strong data augmentation,” in IEEE/CVF Int. Conf. Comput. Vis., 2021, pp. 8229–8238.
  65. V. Olsson, W. Tranheden, J. Pinto, and L. Svensson, “Classmix: Segmentation-based data augmentation for semi-supervised learning,” in IEEE/CVF Winter Conf. Appl. Comput. Vis., 2021, pp. 1369–1378.
  66. W. Luo and M. Yang, “Semi-supervised semantic segmentation via strong-weak dual-branch network,” in Eur. Conf. Comput. Vis., 2020, pp. 784–800.
  67. Y. Zou, Z. Yu, B. V. K. V. Kumar, and J. Wang, “Unsupervised domain adaptation for semantic segmentation via class-balanced self-training,” in Eur. Conf. Comput. Vis., 2018, pp. 289–305.
  68. L. Yang, W. Zhuo, L. Qi, Y. Shi, and Y. Gao, “St++: Make self-training work better for semi-supervised semantic segmentation,” in IEEE/CVF Conf. Comput. Vis. Pattern Recog., 2022, pp. 4268–4277.
  69. A. P. S. Kohli, V. Sitzmann, and G. Wetzstein, “Semantic implicit neural scene representations with semi-supervised training,” in IEEE Int. Conf. 3D Vision, 2020, pp. 423–433.
  70. C.-Y. Sun, Y.-Q. Yang, H.-X. Guo, P.-S. Wang, X. Tong, Y. Liu, and H.-Y. Shum, “Semi-supervised 3d shape segmentation with multilevel consistency and part substitution,” Computational Visual Media, vol. 9, no. 2, pp. 229–247, 2023.
  71. S. Deng, Q. Dong, B. Liu, and Z. Hu, “Superpoint-guided semi-supervised semantic segmentation of 3d point clouds,” arXiv preprint arXiv:2107.03601, 2021.
  72. M. Cheng, L. Hui, J. Xie, and J. Yang, “Sspc-net: Semi-supervised semantic 3d point cloud segmentation network,” in AAAI Conf. Artifi. Intell., 2021, pp. 1140–1147.
  73. J. Hou, B. Graham, M. Nießner, and S. Xie, “Exploring data-efficient 3d scene understanding with contrastive scene contexts,” in IEEE/CVF Conf. Comput. Vis. Pattern Recog., 2021, pp. 15 587–15 597.
  74. L. Kong, Y. Liu, L. X. Ng, B. R. Cottereau, and W. T. Ooi, “Openess: Event-based semantic scene understanding with open vocabularies,” in IEEE/CVF Conf. Comput. Vis. Pattern Recog., 2024.
  75. J. Wang, H. Gang, S. Ancha, Y.-T. Chen, and D. Held, “Semi-supervised 3d object detection via temporal graph neural networks,” in IEEE Int. Conf. 3D Vision, 2021, pp. 413–422.
  76. C. Sautier, G. Puy, S. Gidaris, A. Boulch, A. Bursuc, and R. Marlet, “Image-to-lidar self-supervised distillation for autonomous driving data,” in IEEE/CVF Conf. Comput. Vis. Pattern Recog., 2022, pp. 9891–9901.
  77. Y. Liu, L. Kong, J. Cen, R. Chen, W. Zhang, L. Pan, K. Chen, and Z. Liu, “Segment any point cloud sequences by distilling vision foundation models,” in Adv. Neural Inf. Process. Syst., vol. 36, 2023.
  78. G. Puy, S. Gidaris, A. Boulch, O. Siméoni, C. Sautier, P. Pérez, A. Bursuc, and R. Marlet, “Three pillars improving vision foundation model distillation for lidar,” in IEEE/CVF Conf. Comput. Vis. Pattern Recog., 2024.
  79. M. Jaritz, T.-H. Vu, R. de Charette, E. Wirbel, and P. Pérez, “xmuda: Cross-modal unsupervised domain adaptation for 3d semantic segmentation,” in IEEE/CVF Conf. Comput. Vis. Pattern Recog., 2020, pp. 12 605–12 614.
  80. J. Xu, W. Yang, L. Kong, Y. Liu, R. Zhang, Q. Zhou, and B. Fei, “Visual foundation models boost cross-modal unsupervised domain adaptation for 3d semantic segmentation,” arXiv preprint arXiv:2403.10001, 2024.
  81. M. Jaritz, T.-H. Vu, R. de Charette, E. Wirbel, and P. Pérez, “Cross-modal learning for domain adaptation in 3d semantic segmentation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 2, pp. 1533–1544, 2023.
  82. R. Chen, Y. Liu, L. Kong, X. Zhu, Y. Ma, Y. Li, Y. Hou, Y. Qiao, and W. Wang, “Clip2scene: Towards label-efficient 3d scene understanding by clip,” in IEEE/CVF Conf. Comput. Vis. Pattern Recog., 2023, pp. 7020–7030.
  83. S. Peng, K. Genova, C. Jiang, A. Tagliasacchi, M. Pollefeys, and T. Funkhouser, “Openscene: 3d scene understanding with open vocabularies,” in IEEE/CVF Conf. Comput. Vis. Pattern Recog., 2023, pp. 815–824.
  84. R. Chen, Y. Liu, L. Kong, N. Chen, X. Zhu, Y. Ma, T. Liu, and W. Wang., “Towards label-free scene understanding by vision foundation models,” in Adv. Neural Inf. Process. Syst., vol. 36, 2023.
  85. Y. Liu, L. Kong, X. Wu, R. Chen, X. Li, L. Pan, Z. Liu, and Y. Ma, “Multi-space alignments towards universal lidar segmentation,” in IEEE/CVF Conf. Comput. Vis. Pattern Recog., 2024.
  86. Y. Grandvalet and Y. Bengio, “Semi-supervised learning by entropy minimization,” in Adv. Neural Inf. Process. Syst., vol. 17, 2004.
  87. H. Zhang, M. Cisse, Y. N. Dauphin, and D. Lopez-Paz, “mixup: Beyond empirical risk minimization,” in Int. Conf. Learn. Represent., 2018.
  88. S. Yun, D. Han, S. J. Oh, S. Chun, J. Choe, and Y. Yoo, “Cutmix: Regularization strategy to train strong classifiers with localizable features,” in IEEE/CVF Int. Conf. Comput. Vis., 2019, pp. 6023–6032.
  89. Q. Hu, B. Yang, L. Xie, S. Rosa, Y. Guo, N. T. Zhihua Wang, and A. Markham, “Randla-net: Efficient semantic segmentation of large-scale point clouds,” in IEEE/CVF Conf. Comput. Vis. Pattern Recog., 2020, pp. 11 108–11 117.
  90. D.-H. Lee, “Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks,” in Int. Conf. Mach. Learn. Worksh., vol. 3, 2013.
  91. T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft coco: Common objects in context,” in Eur. Conf. Comput. Vis., 2014, pp. 740–755.
  92. H. Zhang, F. Li, X. Zou, S. Liu, C. Li, J. Gao, J. Yang, and L. Zhang, “Objects365: A large-scale, high-quality dataset for object detection,” in IEEE/CVF Int. Conf. Comput. Vis., 2019, pp. 8430–8439.
  93. B. Zhou, H. Zhao, X. Puig, S. Fidler, A. Barriuso, and A. Torralba, “Scene parsing through ade20k dataset,” in IEEE/CVF Conf. Comput. Vis. Pattern Recog., 2017, pp. 633–641.
  94. M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele, “The cityscapes dataset for semantic urban scene understanding,” in IEEE/CVF Conf. Comput. Vis. Pattern Recog., 2016, pp. 3213–3223.
  95. A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y. Lo, P. Dollár, and R. Girshick, “Segment anything,” in IEEE/CVF Int. Conf. Comput. Vis., 2023, pp. 4015–4026.
  96. H. Zhang, F. Li, X. Zou, S. Liu, C. Li, J. Gao, J. Yang, and L. Zhang, “A simple framework for open-vocabulary segmentation and detection,” in IEEE/CVF Int. Conf. Comput. Vis., 2023, pp. 1020–1031.
  97. X. Zou, Z.-Y. Dou, J. Yang, Z. Gan, L. Li, C. Li, X. Dai, H. Behl, J. Wang, L. Yuan, N. Peng, L. Wang, Y. J. Lee, and J. Gao, “Generalized decoding for pixel, image, and language,” in IEEE/CVF Conf. Comput. Vis. Pattern Recog., 2023, pp. 15 116–15 127.
  98. X. Zou, J. Yang, H. Zhang, F. Li, L. Li, J. Gao, and Y. J. Lee, “Segment everything everywhere all at once,” in Adv. Neural Inf. Process. Syst., vol. 36, 2023.
  99. T. Yin, X. Zhou, and P. Krahenbuhl, “Center-based 3d object detection and tracking,” in IEEE/CVF Conf. Comput. Vis. Pattern Recog., 2021, pp. 11 784–11 793.
  100. A. Nekrasov, J. Schult, O. Litany, B. Leibe, and F. Engelmann, “Mix3d: Out-of-context data augmentation for 3d scenes,” in IEEE Int. Conf. 3D Vision, 2021, pp. 116–125.
  101. A. Xiao, J. Huang, D. Guan, K. Cui, S. Lu, and L. Shao, “Polarmix: A general data augmentation technique for lidar point clouds,” in Adv. Neural Inf. Process. Syst., vol. 35, 2022, pp. 11 035–11 048.
  102. S. Xie, J. Gu, D. Guo, C. R. Qi, L. Guibas, and O. Litany, “Pointcontrast: Unsupervised pre-training for 3d point cloud understanding,” in Eur. Conf. Comput. Vis., 2020, pp. 574–591.
  103. Z. Zhang, R. Girdhar, A. Joulin, and I. Misra, “Self-supervised pretraining of 3d features on any point-cloud,” in IEEE/CVF Int. Conf. Comput. Vis., 2021, pp. 10 252–10 263.
  104. Y.-C. Liu, Y.-K. Huang, H.-Y. Chiang, H.-T. Su, Z.-Y. Liu, C.-T. Chen, C.-Y. Tseng, and W. H. Hsu, “Learning from 2d: Contrastive pixel-to-point knowledge transfer for 3d pretraining,” arXiv preprint arXiv:2104.0468, 2021.
  105. A. Mahmoud, J. S. Hu, T. Kuai, A. Harakeh, L. Paull, and S. L. Waslander, “Self-supervised image-to-point distillation via semantically tolerant contrastive loss,” in IEEE/CVF Conf. Comput. Vis. Pattern Recog., 2023, pp. 7102–7110.
  106. M. Contributors, “MMDetection3D: OpenMMLab next-generation platform for general 3D object detection,” https://github.com/open-mmlab/mmdetection3d, 2020.
  107. I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” in Int. Conf. Learn. Represent., 2018.
  108. L. N. Smith and N. Topin, “Super-convergence: Very fast training of neural networks using large learning rates,” arXiv preprint arXiv:1708.07120, 2017.
  109. A. Oliver, A. Odena, C. A. Raffel, E. D. Cubuk, and I. Goodfellow, “Realistic evaluation of deep semi-supervised learning algorithms,” in Adv. Neural Inf. Process. Syst., vol. 31, 2018.
  110. T. DeVries and G. W. Taylor, “Improved regularization of convolutional neural networks with cutout,” arXiv preprint arXiv:1708.04552, 2017.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Lingdong Kong (49 papers)
  2. Xiang Xu (83 papers)
  3. Jiawei Ren (33 papers)
  4. Wenwei Zhang (77 papers)
  5. Liang Pan (93 papers)
  6. Kai Chen (512 papers)
  7. Wei Tsang Ooi (26 papers)
  8. Ziwei Liu (368 papers)
Citations (8)

Summary

We haven't generated a summary for this paper yet.