3DCoMPaT$^{++}$: An improved Large-scale 3D Vision Dataset for Compositional Recognition (2310.18511v2)
Abstract: In this work, we present 3DCoMPaT${++}$, a multimodal 2D/3D dataset with 160 million rendered views of more than 10 million stylized 3D shapes carefully annotated at the part-instance level, alongside matching RGB point clouds, 3D textured meshes, depth maps, and segmentation masks. 3DCoMPaT${++}$ covers 41 shape categories, 275 fine-grained part categories, and 293 fine-grained material classes that can be compositionally applied to parts of 3D objects. We render a subset of one million stylized shapes from four equally spaced views as well as four randomized views, leading to a total of 160 million renderings. Parts are segmented at the instance level, with coarse-grained and fine-grained semantic levels. We introduce a new task, called Grounded CoMPaT Recognition (GCR), to collectively recognize and ground compositions of materials on parts of 3D objects. Additionally, we report the outcomes of a data challenge organized at CVPR2023, showcasing the winning method's utilization of a modified PointNet${++}$ model trained on 6D inputs, and exploring alternative techniques for GCR enhancement. We hope our work will help ease future research on compositional 3D Vision.
- A. X. Chang, T. Funkhouser, L. Guibas, P. Hanrahan, Q. Huang, Z. Li, S. Savarese, M. Savva, S. Song, H. Su, J. Xiao, L. Yi, and F. Yu, “ShapeNet: An Information-Rich 3D Model Repository,” in arXiv, 2015.
- Z. Wu, S. Song, A. Khosla, F. Yu, L. Zhang, X. Tang, and J. Xiao, “3D ShapeNets: A Deep Representation for Volumetric Shapes,” in CVPR, 2015.
- K. Mo, S. Zhu, A. X. Chang, L. Yi, S. Tripathi, L. J. Guibas, and H. Su, “PartNet: A Large-scale Benchmark for Fine-grained and Hierarchical Part-level 3D Object Understanding,” in CVPR, 2019.
- T. Wu, J. Zhang, X. Fu, Y. Wang, J. Ren, L. Pan, W. Wu, L. Yang, J. Wang, C. Qian, D. Lin, and Z. Liu, “OmniObject3D: Large-Vocabulary 3D Object Dataset for Realistic Perception, Reconstruction and Generation,” in CVPR, 2023.
- J. Collins, S. Goel, K. Deng, A. Luthra, L. Xu, E. Gundogdu, X. Zhang, T. F. Y. Vicente, T. Dideriksen, H. Arora, M. Guillaumin, and J. Malik, “ABO: Dataset and Benchmarks for Real-World 3D Object Understanding,” in CVPR, 2022.
- H. Fu, R. Jia, L. Gao, M. Gong, B. Zhao, S. Maybank, and D. Tao, “3D-FUTURE: 3D Furniture shape with TextURE,” in IJCV, 2021.
- M. Deitke, D. Schwenk, J. Salvador, L. Weihs, O. Michel, E. VanderBilt, L. Schmidt, K. Ehsani, A. Kembhavi, and A. Farhadi, “Objaverse: A Universe of Annotated 3D Objects,” in CVPR, 2023.
- M. Deitke, R. Liu, M. Wallingford, H. Ngo, O. Michel, A. Kusupati, A. Fan, C. Laforte, V. Voleti, S. Y. Gadre, E. VanderBilt, A. Kembhavi, C. Vondrick, G. Gkioxari, K. Ehsani, L. Schmidt, and A. Farhadi, “Objaverse-XL: A Universe of 10M+ 3D Objects,” in arXiv, 2023.
- A. Dai, A. X. Chang, M. Savva, M. Halber, T. Funkhouser, and M. Nießner, “ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes,” in CVPR, 2017.
- A. Chang, A. Dai, T. Funkhouser, M. Halber, M. Nießner, M. Savva, S. Song, A. Zeng, and Y. Zhang, “Matterport3D: Learning from RGB-D Data in Indoor Environments,” in 3DV, 2017.
- L. Yi, V. G. Kim, D. Ceylan, I.-C. Shen, M. Yan, H. Su, C. Lu, Q. Huang, A. Sheffer, and L. Guibas, “A scalable active framework for region annotation in 3D shape collections,” in SIGGRAPH, 2016.
- Y. Li, U. Upadhyay, H. Slim, A. Abdelreheem, A. Prajapati, S. Pothigara, P. Wonka, and M. Elhoseiny, “3D CoMPaT: Composition of Materials on Parts of 3D Things,” in ECCV, 2022.
- H. Lin, M. Averkiou, E. Kalogerakis, B. Kovacs, S. Ranade, V. G. Kim, S. Chaudhuri, and K. Bala, “Learning Material-Aware Local Descriptors for 3D Shapes,” in 3DV, 2018.
- Z. Li, T.-W. Yu, S. Sang, S. Wang, M. Song, Y. Liu, Y.-Y. Yeh, R. Zhu, N. Gundavarapu, J. Shi, S. Bi, H.-X. Yu, Z. Xu, K. Sunkavalli, M. Hasan, R. Ramamoorthi, and M. Chandraker, “OpenRooms: An Open Framework for Photorealistic Indoor Scene Datasets,” in CVPR, 2021.
- K. Park, K. Rematas, A. Farhadi, and S. M. Seitz, “PhotoShape: Photorealistic Materials for Large-Scale Shape Collections,” in SIGGRAPH Asia, 2018.
- L. Downs, A. Francis, N. Koenig, B. Kinman, R. Hickman, K. Reymann, T. B. McHugh, and V. Vanhoucke, “Google Scanned Objects: A High-Quality Dataset of 3D Scanned Household Items,” in ICRA, 2022.
- Y. Xiang, W. Kim, W. Chen, J. Ji, C. Choy, H. Su, R. Mottaghi, L. Guibas, and S. Savarese, “ObjectNet3D: A Large Scale Database for 3D Object Recognition,” in ECCV, 2016.
- G. A. Miller, “WordNet: a lexical database for English,” in ACM Communications, 1995.
- J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “ImageNet: A large-scale hierarchical image database,” in CVPR, 2009.
- A. Gupta, P. Dollár, and R. Girshick, “LVIS: A Dataset for Large Vocabulary Instance Segmentation,” in CVPR, 2019.
- Morgan Kaufmann Publishers Inc., 3rd ed., 2016.
- three.js, “three.js: A 3D Javascript graphics library.” https://threejs.org/, 2023.
- A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala, “Pytorch: An imperative style, high-performance deep learning library,” in NeurIPS, 2019.
- FastML, “WebDataset: A PyTorch Dataset for Large-Scale and High-Resolution Data.” https://github.com/webdataset/webdataset, 2023.
- K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” in CVPR, 2016.
- T. Xiang, C. Zhang, Y. Song, J. Yu, and W. Cai, “Walk in the Cloud: Learning Curves for Point Clouds Shape Analysis,” in ICCV, 2021.
- M.-H. Guo, J.-X. Cai, Z.-N. Liu, T.-J. Mu, R. R. Martin, and S.-M. Hu, “PCT: Point cloud transformer,” Computational Visual Media, 2021.
- C. R. Qi, H. Su, K. Mo, and L. J. Guibas, “PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation,” in CVPR, 2017.
- E. Xie, W. Wang, Z. Yu, A. Anandkumar, J. M. Alvarez, and P. Luo, “SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers,” in NeurIPS, 2021.
- G. Qian, Y. Li, H. Peng, J. Mai, H. A. A. K. Hammoud, M. Elhoseiny, and B. Ghanem, “PointNeXt: Revisiting PointNet++ with Improved Training and Scaling Strategies,” in NeurIPS, 2022.
- K. T. Wijaya, D.-H. Paek, and S.-H. Kong, “Advanced Feature Learning on Point Clouds using Multi-resolution Features and Learnable Pooling,” in arXiv, 2022.
- Y. Wang, Y. Sun, Z. Liu, S. E. Sarma, M. M. Bronstein, and J. M. Solomon, “Dynamic Graph CNN for Learning on Point Clouds,” in SIGGRAPH, 2019.
- X. Ma, C. Qin, H. You, H. Ran, and Y. Fu, “Rethinking Network Design and Local Geometry in Point Cloud: A Simple Residual MLP Framework,” in ICLR, 2022.
- W. Hu, H. Zhao, L. Jiang, J. Jia, and T.-T. Wong, “Bidirectional Projection Network for Cross Dimension Scene Understanding,” in CVPR, 2021.
- C. H. Lampert, H. Nickisch, and S. Harmeling, “Learning to detect unseen object classes by between-class attribute transfer,” in CVPR, 2009.
- A. Farhadi, I. Endres, D. Hoiem, and D. Forsyth, “Describing objects by their attributes,” in CVPR, 2009.
- M. Elhoseiny, B. Saleh, and A. Elgammal, “Write a Classifier: Zero-Shot Learning Using Purely Textual Descriptions,” in CVPR, 2013.
- S. Pratt, M. Yatskar, L. Weihs, A. Farhadi, and A. Kembhavi, “Grounded Situation Recognition,” in ECCV, 2020.
- M. Yatskar, L. Zettlemoyer, and A. Farhadi, “Situation Recognition: Visual Semantic Role Labeling for Image Understanding,” in CVPR, 2016.
- X. Zeng, A. Vahdat, F. Williams, Z. Gojcic, O. Litany, S. Fidler, and K. Kreis, “LION: Latent Point Diffusion Models for 3D Shape Generation,” in NeurIPS, 2022.
- J. Ho, A. Jain, and P. Abbeel, “Denoising Diffusion Probabilistic Models,” in NeurIPS, 2020.
- Habib Slim (3 papers)
- Xiang Li (1003 papers)
- Yuchen Li (85 papers)
- Mahmoud Ahmed (6 papers)
- Mohamed Ayman (2 papers)
- Ujjwal Upadhyay (8 papers)
- Ahmed Abdelreheem (8 papers)
- Arpit Prajapati (2 papers)
- Suhail Pothigara (1 paper)
- Peter Wonka (130 papers)
- Mohamed Elhoseiny (102 papers)