Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

PointVST: Self-Supervised Pre-training for 3D Point Clouds via View-Specific Point-to-Image Translation (2212.14197v4)

Published 29 Dec 2022 in cs.CV

Abstract: The past few years have witnessed the great success and prevalence of self-supervised representation learning within the language and 2D vision communities. However, such advancements have not been fully migrated to the field of 3D point cloud learning. Different from existing pre-training paradigms designed for deep point cloud feature extractors that fall into the scope of generative modeling or contrastive learning, this paper proposes a translative pre-training framework, namely PointVST, driven by a novel self-supervised pretext task of cross-modal translation from 3D point clouds to their corresponding diverse forms of 2D rendered images. More specifically, we begin with deducing view-conditioned point-wise embeddings through the insertion of the viewpoint indicator, and then adaptively aggregate a view-specific global codeword, which can be further fed into subsequent 2D convolutional translation heads for image generation. Extensive experimental evaluations on various downstream task scenarios demonstrate that our PointVST shows consistent and prominent performance superiority over current state-of-the-art approaches as well as satisfactory domain transfer capability. Our code will be publicly available at https://github.com/keeganhk/PointVST.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (83)
  1. C. R. Qi, H. Su, K. Mo, and L. J. Guibas, “Pointnet: Deep learning on point sets for 3d classification and segmentation,” in Proc. CVPR, 2017, pp. 652–660.
  2. C. R. Qi, L. Yi, H. Su, and L. J. Guibas, “Pointnet++: Deep hierarchical feature learning on point sets in a metric space,” in Proc. NeurIPS, 2017, pp. 5105–5114.
  3. Y. Li, R. Bu, M. Sun, W. Wu, X. Di, and B. Chen, “Pointcnn: Convolution on χ𝜒\chiitalic_χ-transformed points,” in Proc. NeurIPS, 2018, pp. 828–838.
  4. Y. Wang, Y. Sun, Z. Liu, S. E. Sarma, M. M. Bronstein, and J. M. Solomon, “Dynamic graph cnn for learning on point clouds,” ACM TOG, vol. 38, no. 5, pp. 1–12, 2019.
  5. Y. Liu, B. Fan, S. Xiang, and C. Pan, “Relation-shape convolutional neural network for point cloud analysis,” in Proc. CVPR, 2019, pp. 8895–8904.
  6. H. Thomas, C. R. Qi, J.-E. Deschaud, B. Marcotegui, F. Goulette, and L. J. Guibas, “Kpconv: Flexible and deformable convolution for point clouds,” in Proc. ICCV, 2019, pp. 6411–6420.
  7. T. Xiang, C. Zhang, Y. Song, J. Yu, and W. Cai, “Walk in the cloud: Learning curves for point clouds shape analysis,” in Proc. ICCV, 2021, pp. 915–924.
  8. H. Zhao, L. Jiang, J. Jia, P. H. Torr, and V. Koltun, “Point transformer,” in Proc. ICCV, 2021, pp. 16 259–16 268.
  9. L. Yu, X. Li, C.-W. Fu, D. Cohen-Or, and P.-A. Heng, “Pu-net: Point cloud upsampling network,” in Proc. CVPR, 2018, pp. 2790–2799.
  10. Y. Wang and J. M. Solomon, “Deep closest point: Learning representations for point cloud registration,” in Proc. ICCV, 2019, pp. 3523–3532.
  11. Y. Aoki, H. Goforth, R. A. Srivatsan, and S. Lucey, “Pointnetlk: Robust & efficient point cloud registration using pointnet,” in Proc. CVPR, 2019, pp. 7163–7172.
  12. M. A. Uy and G. H. Lee, “Pointnetvlad: Deep point cloud based retrieval for large-scale place recognition,” in Proc. CVPR, 2018, pp. 4470–4479.
  13. S. Shi, X. Wang, and H. Li, “Pointrcnn: 3d object proposal generation and detection from point cloud,” in Proc. CVPR, 2019, pp. 770–779.
  14. Q. Hu, B. Yang, L. Xie, S. Rosa, Y. Guo, Z. Wang, N. Trigoni, and A. Markham, “Randla-net: Efficient semantic segmentation of large-scale point clouds,” in Proc. CVPR, 2020, pp. 11 108–11 117.
  15. Q. Zhang, J. Hou, and Y. Qian, “Pointmcd: Boosting deep point cloud encoders via multi-view cross-modal distillation for 3d shape recognition,” IEEE TMM, 2023.
  16. A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving? the kitti vision benchmark suite,” in Proc. CVPR, 2012, pp. 3354–3361.
  17. I. Armeni, O. Sener, A. R. Zamir, H. Jiang, I. Brilakis, M. Fischer, and S. Savarese, “3d semantic parsing of large-scale indoor spaces,” in Proc. CVPR, 2016, pp. 1534–1543.
  18. K. Mo, S. Zhu, A. X. Chang, L. Yi, S. Tripathi, L. J. Guibas, and H. Su, “Partnet: A large-scale benchmark for fine-grained and hierarchical part-level 3d object understanding,” in Proc. CVPR, 2019, pp. 909–918.
  19. J. Sauder and B. Sievers, “Self-supervised deep learning on point clouds by reconstructing space,” Proc. NeurIPS, vol. 32, 2019.
  20. O. Poursaeed, T. Jiang, H. Qiao, N. Xu, and V. G. Kim, “Self-supervised learning of point clouds via orientation estimation,” in Proc. 3DV, 2020, pp. 1018–1028.
  21. C. Sharma and M. Kaul, “Self-supervised few-shot learning on point clouds,” Proc. NeurIPS, vol. 33, pp. 7212–7221, 2020.
  22. H. Wang, Q. Liu, X. Yue, J. Lasenby, and M. J. Kusner, “Unsupervised point cloud pre-training via occlusion completion,” in Proc. ICCV, 2021, pp. 9782–9792.
  23. S. Xie, J. Gu, D. Guo, C. R. Qi, L. Guibas, and O. Litany, “Pointcontrast: Unsupervised pre-training for 3d point cloud understanding,” in Proc. ECCV, 2020, pp. 574–591.
  24. S. Huang, Y. Xie, S.-C. Zhu, and Y. Zhu, “Spatio-temporal self-supervised representation learning for 3d point clouds,” in Proc. ICCV, 2021, pp. 6535–6545.
  25. Z. Zhang, R. Girdhar, A. Joulin, and I. Misra, “Self-supervised pretraining of 3d features on any point-cloud,” in Proc. ICCV, 2021, pp. 10 252–10 263.
  26. M. Afham, I. Dissanayake, D. Dissanayake, A. Dharmasiri, K. Thilakarathna, and R. Rodrigo, “Crosspoint: Self-supervised cross-modal contrastive learning for 3d point cloud understanding,” in Proc. CVPR, 2022, pp. 9902–9912.
  27. J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2018.
  28. L. Jing and Y. Tian, “Self-supervised visual feature learning with deep neural networks: A survey,” IEEE TPAMI, vol. 43, no. 11, pp. 4037–4058, 2020.
  29. X. Liu, F. Zhang, Z. Hou, L. Mian, Z. Wang, J. Zhang, and J. Tang, “Self-supervised learning: Generative or contrastive,” IEEE TKDE, 2021.
  30. A. Sanghi, “Info3d: Representation learning on 3d objects using mutual information maximization and contrastive learning,” in Proc. ECCV, 2020, pp. 626–642.
  31. B. Du, X. Gao, W. Hu, and X. Li, “Self-contrastive learning with hard negative sampling for self-supervised point cloud learning,” in Proc. ACM MM, 2021, pp. 3133–3142.
  32. Y. Zeng, Y. Qian, Z. Zhu, J. Hou, H. Yuan, and Y. He, “Corrnet3d: Unsupervised end-to-end learning of dense correspondence for 3d point clouds,” in Proc. CVPR, 2021, pp. 6052–6061.
  33. W. Feng, J. Zhang, H. Cai, H. Xu, J. Hou, and H. Bao, “Recurrent multi-view alignment network for unsupervised surface registration,” in Proc. CVPR, 2021, pp. 10 297–10 307.
  34. J.-B. Grill, F. Strub, F. Altché, C. Tallec, P. Richemond, E. Buchatskaya, C. Doersch, B. Avila Pires, Z. Guo, M. Gheshlaghi Azar et al., “Bootstrap your own latent-a new approach to self-supervised learning,” Proc. NeurIPS, vol. 33, pp. 21 271–21 284, 2020.
  35. L. Jing, P. Vincent, Y. LeCun, and Y. Tian, “Understanding dimensional collapse in contrastive self-supervised learning,” in Proc. ICLR, 2022.
  36. W. Wu, Z. Qi, and L. Fuxin, “Pointconv: Deep convolutional networks on 3d point clouds,” in Proc. CVPR, 2019, pp. 9621–9630.
  37. Q. Xu, X. Sun, C.-Y. Wu, P. Wang, and U. Neumann, “Grid-gcn for fast and scalable point cloud learning,” in Proc. CVPR, 2020, pp. 5661–5670.
  38. M. Xu, R. Ding, H. Zhao, and X. Qi, “Paconv: Position adaptive convolution with dynamic kernel assembling on point clouds,” in Proc. CVPR, 2021, pp. 3173–3182.
  39. J. Li, B. M. Chen, and G. H. Lee, “So-net: Self-organizing network for point cloud analysis,” in Proc. CVPR, 2018, pp. 9397–9406.
  40. Y. Yang, C. Feng, Y. Shen, and D. Tian, “Foldingnet: Point cloud auto-encoder via deep grid deformation,” in Proc. CVPR, 2018, pp. 206–215.
  41. S. Chen, C. Duan, Y. Yang, D. Li, C. Feng, and D. Tian, “Deep unsupervised learning of 3d point clouds via graph topology inference and filtering,” IEEE TIP, vol. 29, pp. 3183–3198, 2019.
  42. K. Hassani and M. Haley, “Unsupervised multi-task feature learning on point clouds,” in Proc. ICCV, 2019, pp. 8160–8171.
  43. Y. Zhao, T. Birdal, H. Deng, and F. Tombari, “3d point capsule networks,” in Proc. CVPR, 2019, pp. 1009–1018.
  44. X. Liu, Z. Han, X. Wen, Y.-S. Liu, and M. Zwicker, “L2g auto-encoder: Understanding point clouds by local-to-global reconstruction with hierarchical self-attention,” in Proc. ACM MM, 2019, pp. 989–997.
  45. Z. Han, X. Wang, Y.-S. Liu, and M. Zwicker, “Multi-angle point cloud-vae: Unsupervised feature learning for 3d point clouds from multiple angles by joint self-reconstruction and half-to-half prediction,” in Proc. ICCV, 2019, pp. 10 441–10 450.
  46. J. Wu, C. Zhang, T. Xue, B. Freeman, and J. Tenenbaum, “Learning a probabilistic latent space of object shapes via 3d generative-adversarial modeling,” Proc. NeurIPS, vol. 29, 2016.
  47. D. Valsesia, G. Fracastoro, and E. Magli, “Learning localized generative models for 3d point clouds via graph convolution,” in Proc. ICLR, 2018.
  48. P. Achlioptas, O. Diamanti, I. Mitliagkas, and L. Guibas, “Learning representations and generative models for 3d point clouds,” in Proc. ICML, 2018, pp. 40–49.
  49. Z. Han, M. Shang, Y.-S. Liu, and M. Zwicker, “View inter-prediction gan: Unsupervised representation learning for 3d shapes by learning global shape memories to support local view predictions,” in Proc. AAAI, vol. 33, no. 01, 2019, pp. 8376–8384.
  50. G. Yang, X. Huang, Z. Hao, M.-Y. Liu, S. Belongie, and B. Hariharan, “Pointflow: 3d point cloud generation with continuous normalizing flows,” in Proc. ICCV, 2019, pp. 4541–4550.
  51. Y. Sun, Y. Wang, Z. Liu, J. Siegel, and S. Sarma, “Pointgrow: Autoregressively learned point cloud generation with self-attention,” in Proc. WACV, 2020, pp. 61–70.
  52. N. Komodakis and S. Gidaris, “Unsupervised representation learning by predicting image rotations,” in Proc. ICLR, 2018.
  53. Y. Rao, J. Lu, and J. Zhou, “Global-local bidirectional reasoning for unsupervised representation learning of 3d point clouds,” in Proc. CVPR, 2020, pp. 5376–5385.
  54. A. v. d. Oord, Y. Li, and O. Vinyals, “Representation learning with contrastive predictive coding,” arXiv preprint arXiv:1807.03748, 2018.
  55. P. Velickovic, W. Fedus, W. L. Hamilton, P. Liò, Y. Bengio, and R. D. Hjelm, “Deep graph infomax.” in Proc. ICLR, 2019.
  56. G. Sharma, K. Yin, S. Maji, E. Kalogerakis, O. Litany, and S. Fidler, “Mvdecor: Multi-view dense correspondence learning for fine-grained 3d segmentation,” in Proc. ECCV, 2022, pp. 550–567.
  57. K. He, X. Chen, S. Xie, Y. Li, P. Dollár, and R. Girshick, “Masked autoencoders are scalable vision learners,” in Proc. CVPR, 2022, pp. 16 000–16 009.
  58. X. Yu, L. Tang, Y. Rao, T. Huang, J. Zhou, and J. Lu, “Point-bert: Pre-training 3d point cloud transformers with masked point modeling,” in Proc. CVPR, 2022, pp. 19 313–19 322.
  59. Y. Pang, W. Wang, F. E. Tay, W. Liu, Y. Tian, and L. Yuan, “Masked autoencoders for point cloud self-supervised learning,” in Proc. ECCV, 2022, pp. 604–621.
  60. H. Liu, M. Cai, and Y. J. Lee, “Masked discrimination for self-supervised learning on point clouds,” in Proc. ECCV, 2022, pp. 657–675.
  61. R. Zhang, Z. Guo, P. Gao, R. Fang, B. Zhao, D. Wang, Y. Qiao, and H. Li, “Point-m2AE: Multi-scale masked autoencoders for hierarchical point cloud pre-training,” in Proc. NeurIPS, 2022.
  62. M.-H. Guo, J.-X. Cai, Z.-N. Liu, T.-J. Mu, R. R. Martin, and S.-M. Hu, “Pct: Point cloud transformer,” CVM, vol. 7, no. 2, pp. 187–199, 2021.
  63. B. Mildenhall, P. Srinivasan, M. Tancik, J. Barron, R. Ramamoorthi, and R. Ng, “Nerf: Representing scenes as neural radiance fields for view synthesis,” in Proc. ECCV, 2020.
  64. S. Katz, A. Tal, and R. Basri, “Direct visibility of point sets,” in ACM SIGGRAPH, 2007, pp. 24–es.
  65. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. CVPR, 2016, pp. 770–778.
  66. A. X. Chang, T. Funkhouser, L. Guibas, P. Hanrahan, Q. Huang, Z. Li, S. Savarese, M. Savva, S. Song, H. Su et al., “Shapenet: An information-rich 3d model repository,” arXiv preprint arXiv:1512.03012, 2015.
  67. Z. Wu, S. Song, A. Khosla, F. Yu, L. Zhang, X. Tang, and J. Xiao, “3d shapenets: A deep representation for volumetric shapes,” in Proc. CVPR, 2015, pp. 1912–1920.
  68. G. Qian, Y. Li, H. Peng, J. Mai, H. Hammoud, M. Elhoseiny, and B. Ghanem, “Pointnext: Revisiting pointnet++ with improved training and scaling strategies,” Proc. NeurIPS, vol. 35, pp. 23 192–23 204, 2022.
  69. M. A. Uy, Q.-H. Pham, B.-S. Hua, T. Nguyen, and S.-K. Yeung, “Revisiting point cloud classification: A new benchmark dataset and classification model on real-world data,” in Proc. ICCV, 2019, pp. 1588–1597.
  70. L. Yi, V. G. Kim, D. Ceylan, I.-C. Shen, M. Yan, H. Su, C. Lu, Q. Huang, A. Sheffer, and L. Guibas, “A scalable active framework for region annotation in 3d shape collections,” ACM TOG, vol. 35, no. 6, pp. 1–12, 2016.
  71. P. Guerrero, Y. Kleiman, M. Ovsjanikov, and N. J. Mitra, “Pcpnet learning local shape properties from raw point clouds,” in CGF, vol. 37, no. 2, 2018, pp. 75–85.
  72. Q. Li, H. Feng, K. Shi, Y. Gao, Y. Fang, Y.-S. Liu, and Z. Han, “Shs-net: Learning signed hyper surfaces for oriented normal estimation of point clouds,” in Proc. CVPR, 2023, pp. 13 591–13 600.
  73. R. Xu, Z. Dou, N. Wang, S. Xin, S. Chen, M. Jiang, X. Guo, W. Wang, and C. Tu, “Globally consistent normal orientation for point clouds by regularizing the winding-number field,” ACM TOG, vol. 42, no. 4, pp. 1–15, 2023.
  74. N. Ravi, J. Reizenstein, D. Novotny, T. Gordon, W.-Y. Lo, J. Johnson, and G. Gkioxari, “Accelerating 3d deep learning with pytorch3d,” arXiv:2007.08501, 2020.
  75. A. Dai, A. X. Chang, M. Savva, M. Halber, T. Funkhouser, and M. Nießner, “Scannet: Richly-annotated 3d reconstructions of indoor scenes,” in Proc. CVPR, 2017, pp. 5828–5839.
  76. W. Tan, N. Qin, L. Ma, Y. Li, J. Du, G. Cai, K. Yang, and J. Li, “Toronto-3d: A large-scale mobile lidar dataset for semantic segmentation of urban roadways,” in Proc. CVPRW, 2020, pp. 202–203.
  77. Q. Hu, B. Yang, L. Xie, S. Rosa, Y. Guo, Z. Wang, N. Trigoni, and A. Markham, “Learning semantic segmentation of large-scale point clouds with random sampling,” IEEE TPAMI, 2021.
  78. B. Yang, S. Wang, A. Markham, and N. Trigoni, “Robust attentional aggregation of deep feature sets for multi-view 3d reconstruction,” IJCV, vol. 128, no. 1, pp. 53–73, 2020.
  79. B. Wang, Z. Yu, B. Yang, J. Qin, T. Breckon, L. Shao, N. Trigoni, and A. Markham, “Rangeudf: Semantic surface reconstruction from 3d point clouds,” arXiv preprint arXiv:2204.09138, 2022.
  80. L. Van der Maaten and G. Hinton, “Visualizing data using t-sne.” JMLR, vol. 9, no. 11, 2008.
  81. Q. Xu, Z. Xu, J. Philip, S. Bi, Z. Shu, K. Sunkavalli, and U. Neumann, “Point-nerf: Point-based neural radiance fields,” in Proc. CVPR, 2022, pp. 5438–5448.
  82. G. Metzer, R. Hanocka, R. Giryes, N. J. Mitra, and D. Cohen-Or, “Z2p: Instant visualization of point clouds,” in Eurographics, vol. 41, no. 2, 2022, pp. 461–471.
  83. T. Hu, X. Xu, S. Liu, and J. Jia, “Point2pix: Photo-realistic point cloud rendering via neural radiance fields,” in Proc. CVPR, 2023, pp. 8349–8358.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Qijian Zhang (20 papers)
  2. Junhui Hou (138 papers)
Citations (9)

Summary

We haven't generated a summary for this paper yet.