Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

3DCoMPaT$^{++}$: An improved Large-scale 3D Vision Dataset for Compositional Recognition (2310.18511v2)

Published 27 Oct 2023 in cs.CV and cs.AI

Abstract: In this work, we present 3DCoMPaT${++}$, a multimodal 2D/3D dataset with 160 million rendered views of more than 10 million stylized 3D shapes carefully annotated at the part-instance level, alongside matching RGB point clouds, 3D textured meshes, depth maps, and segmentation masks. 3DCoMPaT${++}$ covers 41 shape categories, 275 fine-grained part categories, and 293 fine-grained material classes that can be compositionally applied to parts of 3D objects. We render a subset of one million stylized shapes from four equally spaced views as well as four randomized views, leading to a total of 160 million renderings. Parts are segmented at the instance level, with coarse-grained and fine-grained semantic levels. We introduce a new task, called Grounded CoMPaT Recognition (GCR), to collectively recognize and ground compositions of materials on parts of 3D objects. Additionally, we report the outcomes of a data challenge organized at CVPR2023, showcasing the winning method's utilization of a modified PointNet${++}$ model trained on 6D inputs, and exploring alternative techniques for GCR enhancement. We hope our work will help ease future research on compositional 3D Vision.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (41)
  1. A. X. Chang, T. Funkhouser, L. Guibas, P. Hanrahan, Q. Huang, Z. Li, S. Savarese, M. Savva, S. Song, H. Su, J. Xiao, L. Yi, and F. Yu, “ShapeNet: An Information-Rich 3D Model Repository,” in arXiv, 2015.
  2. Z. Wu, S. Song, A. Khosla, F. Yu, L. Zhang, X. Tang, and J. Xiao, “3D ShapeNets: A Deep Representation for Volumetric Shapes,” in CVPR, 2015.
  3. K. Mo, S. Zhu, A. X. Chang, L. Yi, S. Tripathi, L. J. Guibas, and H. Su, “PartNet: A Large-scale Benchmark for Fine-grained and Hierarchical Part-level 3D Object Understanding,” in CVPR, 2019.
  4. T. Wu, J. Zhang, X. Fu, Y. Wang, J. Ren, L. Pan, W. Wu, L. Yang, J. Wang, C. Qian, D. Lin, and Z. Liu, “OmniObject3D: Large-Vocabulary 3D Object Dataset for Realistic Perception, Reconstruction and Generation,” in CVPR, 2023.
  5. J. Collins, S. Goel, K. Deng, A. Luthra, L. Xu, E. Gundogdu, X. Zhang, T. F. Y. Vicente, T. Dideriksen, H. Arora, M. Guillaumin, and J. Malik, “ABO: Dataset and Benchmarks for Real-World 3D Object Understanding,” in CVPR, 2022.
  6. H. Fu, R. Jia, L. Gao, M. Gong, B. Zhao, S. Maybank, and D. Tao, “3D-FUTURE: 3D Furniture shape with TextURE,” in IJCV, 2021.
  7. M. Deitke, D. Schwenk, J. Salvador, L. Weihs, O. Michel, E. VanderBilt, L. Schmidt, K. Ehsani, A. Kembhavi, and A. Farhadi, “Objaverse: A Universe of Annotated 3D Objects,” in CVPR, 2023.
  8. M. Deitke, R. Liu, M. Wallingford, H. Ngo, O. Michel, A. Kusupati, A. Fan, C. Laforte, V. Voleti, S. Y. Gadre, E. VanderBilt, A. Kembhavi, C. Vondrick, G. Gkioxari, K. Ehsani, L. Schmidt, and A. Farhadi, “Objaverse-XL: A Universe of 10M+ 3D Objects,” in arXiv, 2023.
  9. A. Dai, A. X. Chang, M. Savva, M. Halber, T. Funkhouser, and M. Nießner, “ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes,” in CVPR, 2017.
  10. A. Chang, A. Dai, T. Funkhouser, M. Halber, M. Nießner, M. Savva, S. Song, A. Zeng, and Y. Zhang, “Matterport3D: Learning from RGB-D Data in Indoor Environments,” in 3DV, 2017.
  11. L. Yi, V. G. Kim, D. Ceylan, I.-C. Shen, M. Yan, H. Su, C. Lu, Q. Huang, A. Sheffer, and L. Guibas, “A scalable active framework for region annotation in 3D shape collections,” in SIGGRAPH, 2016.
  12. Y. Li, U. Upadhyay, H. Slim, A. Abdelreheem, A. Prajapati, S. Pothigara, P. Wonka, and M. Elhoseiny, “3D CoMPaT: Composition of Materials on Parts of 3D Things,” in ECCV, 2022.
  13. H. Lin, M. Averkiou, E. Kalogerakis, B. Kovacs, S. Ranade, V. G. Kim, S. Chaudhuri, and K. Bala, “Learning Material-Aware Local Descriptors for 3D Shapes,” in 3DV, 2018.
  14. Z. Li, T.-W. Yu, S. Sang, S. Wang, M. Song, Y. Liu, Y.-Y. Yeh, R. Zhu, N. Gundavarapu, J. Shi, S. Bi, H.-X. Yu, Z. Xu, K. Sunkavalli, M. Hasan, R. Ramamoorthi, and M. Chandraker, “OpenRooms: An Open Framework for Photorealistic Indoor Scene Datasets,” in CVPR, 2021.
  15. K. Park, K. Rematas, A. Farhadi, and S. M. Seitz, “PhotoShape: Photorealistic Materials for Large-Scale Shape Collections,” in SIGGRAPH Asia, 2018.
  16. L. Downs, A. Francis, N. Koenig, B. Kinman, R. Hickman, K. Reymann, T. B. McHugh, and V. Vanhoucke, “Google Scanned Objects: A High-Quality Dataset of 3D Scanned Household Items,” in ICRA, 2022.
  17. Y. Xiang, W. Kim, W. Chen, J. Ji, C. Choy, H. Su, R. Mottaghi, L. Guibas, and S. Savarese, “ObjectNet3D: A Large Scale Database for 3D Object Recognition,” in ECCV, 2016.
  18. G. A. Miller, “WordNet: a lexical database for English,” in ACM Communications, 1995.
  19. J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “ImageNet: A large-scale hierarchical image database,” in CVPR, 2009.
  20. A. Gupta, P. Dollár, and R. Girshick, “LVIS: A Dataset for Large Vocabulary Instance Segmentation,” in CVPR, 2019.
  21. Morgan Kaufmann Publishers Inc., 3rd ed., 2016.
  22. three.js, “three.js: A 3D Javascript graphics library.” https://threejs.org/, 2023.
  23. A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala, “Pytorch: An imperative style, high-performance deep learning library,” in NeurIPS, 2019.
  24. FastML, “WebDataset: A PyTorch Dataset for Large-Scale and High-Resolution Data.” https://github.com/webdataset/webdataset, 2023.
  25. K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” in CVPR, 2016.
  26. T. Xiang, C. Zhang, Y. Song, J. Yu, and W. Cai, “Walk in the Cloud: Learning Curves for Point Clouds Shape Analysis,” in ICCV, 2021.
  27. M.-H. Guo, J.-X. Cai, Z.-N. Liu, T.-J. Mu, R. R. Martin, and S.-M. Hu, “PCT: Point cloud transformer,” Computational Visual Media, 2021.
  28. C. R. Qi, H. Su, K. Mo, and L. J. Guibas, “PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation,” in CVPR, 2017.
  29. E. Xie, W. Wang, Z. Yu, A. Anandkumar, J. M. Alvarez, and P. Luo, “SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers,” in NeurIPS, 2021.
  30. G. Qian, Y. Li, H. Peng, J. Mai, H. A. A. K. Hammoud, M. Elhoseiny, and B. Ghanem, “PointNeXt: Revisiting PointNet++ with Improved Training and Scaling Strategies,” in NeurIPS, 2022.
  31. K. T. Wijaya, D.-H. Paek, and S.-H. Kong, “Advanced Feature Learning on Point Clouds using Multi-resolution Features and Learnable Pooling,” in arXiv, 2022.
  32. Y. Wang, Y. Sun, Z. Liu, S. E. Sarma, M. M. Bronstein, and J. M. Solomon, “Dynamic Graph CNN for Learning on Point Clouds,” in SIGGRAPH, 2019.
  33. X. Ma, C. Qin, H. You, H. Ran, and Y. Fu, “Rethinking Network Design and Local Geometry in Point Cloud: A Simple Residual MLP Framework,” in ICLR, 2022.
  34. W. Hu, H. Zhao, L. Jiang, J. Jia, and T.-T. Wong, “Bidirectional Projection Network for Cross Dimension Scene Understanding,” in CVPR, 2021.
  35. C. H. Lampert, H. Nickisch, and S. Harmeling, “Learning to detect unseen object classes by between-class attribute transfer,” in CVPR, 2009.
  36. A. Farhadi, I. Endres, D. Hoiem, and D. Forsyth, “Describing objects by their attributes,” in CVPR, 2009.
  37. M. Elhoseiny, B. Saleh, and A. Elgammal, “Write a Classifier: Zero-Shot Learning Using Purely Textual Descriptions,” in CVPR, 2013.
  38. S. Pratt, M. Yatskar, L. Weihs, A. Farhadi, and A. Kembhavi, “Grounded Situation Recognition,” in ECCV, 2020.
  39. M. Yatskar, L. Zettlemoyer, and A. Farhadi, “Situation Recognition: Visual Semantic Role Labeling for Image Understanding,” in CVPR, 2016.
  40. X. Zeng, A. Vahdat, F. Williams, Z. Gojcic, O. Litany, S. Fidler, and K. Kreis, “LION: Latent Point Diffusion Models for 3D Shape Generation,” in NeurIPS, 2022.
  41. J. Ho, A. Jain, and P. Abbeel, “Denoising Diffusion Probabilistic Models,” in NeurIPS, 2020.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (11)
  1. Habib Slim (3 papers)
  2. Xiang Li (1003 papers)
  3. Yuchen Li (85 papers)
  4. Mahmoud Ahmed (6 papers)
  5. Mohamed Ayman (2 papers)
  6. Ujjwal Upadhyay (8 papers)
  7. Ahmed Abdelreheem (8 papers)
  8. Arpit Prajapati (2 papers)
  9. Suhail Pothigara (1 paper)
  10. Peter Wonka (130 papers)
  11. Mohamed Elhoseiny (102 papers)
Citations (10)

Summary

We haven't generated a summary for this paper yet.