Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
166 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Part123: Part-aware 3D Reconstruction from a Single-view Image (2405.16888v1)

Published 27 May 2024 in cs.GR and cs.CV

Abstract: Recently, the emergence of diffusion models has opened up new opportunities for single-view reconstruction. However, all the existing methods represent the target object as a closed mesh devoid of any structural information, thus neglecting the part-based structure, which is crucial for many downstream applications, of the reconstructed shape. Moreover, the generated meshes usually suffer from large noises, unsmooth surfaces, and blurry textures, making it challenging to obtain satisfactory part segments using 3D segmentation techniques. In this paper, we present Part123, a novel framework for part-aware 3D reconstruction from a single-view image. We first use diffusion models to generate multiview-consistent images from a given image, and then leverage Segment Anything Model (SAM), which demonstrates powerful generalization ability on arbitrary objects, to generate multiview segmentation masks. To effectively incorporate 2D part-based information into 3D reconstruction and handle inconsistency, we introduce contrastive learning into a neural rendering framework to learn a part-aware feature space based on the multiview segmentation masks. A clustering-based algorithm is also developed to automatically derive 3D part segmentation results from the reconstructed models. Experiments show that our method can generate 3D models with high-quality segmented parts on various objects. Compared to existing unstructured reconstruction methods, the part-aware 3D models from our method benefit some important applications, including feature-preserving reconstruction, primitive fitting, and 3D shape editing.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (100)
  1. Weak convex decomposition by lines-of-sight. In Computer graphics forum, Vol. 32. Wiley Online Library, 23–31.
  2. Hierarchical mesh segmentation based on fitting primitives. The Visual Computer 22 (2006), 181–193.
  3. Segment Anything in 3D with NeRFs. In NeurIPS.
  4. Generative novel view synthesis with 3d-aware diffusion models. arXiv preprint arXiv:2304.02602 (2023).
  5. Single-Stage Diffusion NeRF: A Unified Approach to 3D Generation and Reconstruction. arXiv preprint arXiv:2304.06714 (2023).
  6. A benchmark for 3D mesh segmentation. Acm transactions on graphics (tog) 28, 3 (2009), 1–12.
  7. Sdfusion: Multimodal 3d shape completion, reconstruction, and generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4456–4465.
  8. David L. Davies and Donald W. Bouldin. 1979. A Cluster Separation Measure. IEEE Trans. Pattern Anal. Mach. Intell. 1, 2 (1979), 224–227.
  9. Objaverse: A universe of annotated 3d objects. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 13142–13153.
  10. C· ase: Learning conditional adversarial skill embeddings for physics-based characters. In SIGGRAPH Asia 2023 Conference Papers. 1–11.
  11. Coverage axis: Inner point selection for 3d shape skeletonization. In Computer Graphics Forum, Vol. 41. Wiley Online Library, 419–432.
  12. Google scanned objects: A high-quality dataset of 3d scanned household items. In 2022 International Conference on Robotics and Automation (ICRA). IEEE, 2553–2560.
  13. Thesaurus-based 3D object retrieval with part-in-whole matching. International Journal of Computer Vision 89 (2010), 327–347.
  14. Modeling by example. ACM transactions on graphics (TOG) 23, 3 (2004), 652–663.
  15. An image is worth one word: Personalizing text-to-image generation using textual inversion. arXiv preprint arXiv:2208.01618 (2022).
  16. SDM-NET: Deep generative network for structured deformable mesh. ACM Transactions on Graphics (TOG) 38, 6 (2019), 1–15.
  17. Aleksey Golovinskiy and Thomas Funkhouser. 2008. Randomized cuts for 3D mesh analysis. In ACM SIGGRAPH Asia 2008 papers. 1–12.
  18. Learning Controllable 3D Diffusion Models from Single-view Images. arXiv preprint arXiv:2304.06700 (2023).
  19. Nerfdiff: Single-image view synthesis with nerf-guided distillation from 3d-aware diffusion. In International Conference on Machine Learning. PMLR, 11808–11826.
  20. 3D Mesh Labeling via Deep Convolutional Neural Networks. ACM Trans. Graph. 35, 1 (2015), 3:1–3:12.
  21. Spaghetti: Editing implicit shapes through part aware generation. ACM Transactions on Graphics (TOG) 41, 4 (2022), 1–20.
  22. Denoising diffusion probabilistic models. Advances in neural information processing systems 33 (2020), 6840–6851.
  23. 3d-sis: 3d semantic instance segmentation of rgb-d scans. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 4421–4430.
  24. Conceptfusion: Open-set multimodal 3d mapping. arXiv preprint arXiv:2302.07241 (2023).
  25. Easy mesh cutting. In Computer Graphics Forum, Vol. 25. Wiley Online Library, 283–291.
  26. Heewoo Jun and Alex Nichol. 2023. Shap-e: Generating conditional 3d implicit functions. arXiv preprint arXiv:2305.02463 (2023).
  27. Shape segmentation by approximate convexity analysis. ACM Transactions on Graphics (TOG) 34, 1 (2014), 1–11.
  28. A Survey of Simple Geometric Primitives Detection Methods for Captured 3D Data. Comput. Graph. Forum 38, 1 (2019), 167–196.
  29. James T Kajiya and Brian P Von Herzen. 1984. Ray tracing volume densities. ACM SIGGRAPH computer graphics 18, 3 (1984), 165–174.
  30. 3D shape segmentation with projective convolutional networks. In proceedings of the IEEE conference on computer vision and pattern recognition. 3779–3788.
  31. Learning 3D mesh segmentation and labeling. ACM Trans. Graph. 29, 4 (2010), 102:1–102:12.
  32. Holofusion: Towards photo-realistic 3d generative modeling. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 22976–22985.
  33. Holodiffusion: Training a 3D diffusion model using 2D images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 18423–18433.
  34. Mesh segmentation using feature point and core extraction. The Visual Computer 21 (2005), 649–658.
  35. Segment Anything. arXiv:2304.02643 (2023).
  36. Decomposing nerf for editing via feature field distillation. Advances in Neural Information Processing Systems 35 (2022), 23311–23330.
  37. Salad: Part-level latent diffusion for 3d shape generation and manipulation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 14441–14451.
  38. Fast mesh segmentation using random walks. In Proceedings of the 2008 ACM symposium on Solid and physical modeling. 183–191.
  39. Least Squares Conformal Maps for Automatic Texture Atlas Generation. ACM Transactions on Graphics 21, 3 (2002), 10–p.
  40. Grounded language-image pre-training. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10965–10975.
  41. Modeling 3d shapes by reinforcement learning. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part X 16. Springer, 545–561.
  42. Point2skeleton: Learning skeletal representations from point clouds. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 4277–4286.
  43. Seg-mat: 3d shape segmentation using medial axis transform. IEEE transactions on visualization and computer graphics 28, 6 (2020), 2430–2444.
  44. One-2-3-45: Any single image to 3d mesh in 45 seconds without per-shape optimization. arXiv preprint arXiv:2306.16928 (2023).
  45. Partslip: Low-shot part segmentation for 3d point clouds via pretrained image-language models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 21736–21746.
  46. Zero-1-to-3: Zero-shot One Image to 3D Object. CoRR abs/2303.11328 (2023).
  47. Robust and accurate superquadric recovery: A probabilistic approach. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2676–2685.
  48. SyncDreamer: Generating Multiview-consistent Images from a Single-view Image. arXiv preprint arXiv:2309.03453 (2023).
  49. Stuart Lloyd. 1982. Least squares quantization in PCM. IEEE transactions on information theory 28, 2 (1982), 129–137.
  50. Wonder3D: Single Image to 3D using Cross-Domain Diffusion. arXiv preprint arXiv:2310.15008 (2023).
  51. Neuraludf: Learning unsigned distance fields for multi-view reconstruction of surfaces with arbitrary topologies. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 20834–20843.
  52. Sparseneus: Fast generalizable neural surface reconstruction from sparse views. In European Conference on Computer Vision. Springer, 210–227.
  53. William E. Lorensen and Harvey E. Cline. 1987. Marching cubes: A high resolution 3D surface construction algorithm. In Proceedings of the 14th Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH 1987, Anaheim, California, USA, July 27-31, 1987, Maureen C. Stone (Ed.). ACM, 163–169.
  54. Realfusion: 360deg reconstruction of any object from a single image. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8446–8455.
  55. NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. In European Conference on Computer Vision. Springer, 405–421.
  56. StructureNet: hierarchical graph networks for 3D shape generation. ACM Transactions on Graphics (TOG) 38, 6 (2019), 1–19.
  57. Difffacto: Controllable part-based 3d point cloud generation with cross diffusion. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 14257–14267.
  58. 3d-ldm: Neural implicit 3d shape generation with latent diffusion models. arXiv preprint arXiv:2212.00842 (2022).
  59. Point-e: A system for generating 3d point clouds from complex prompts. arXiv preprint arXiv:2212.08751 (2022).
  60. Im2struct: Recovering 3d shape structure from a single rgb image. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4521–4529.
  61. Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748 (2018).
  62. Learning unsupervised hierarchical part decomposition of 3d objects from a single rgb image. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1060–1070.
  63. Neural parts: Learning expressive 3d shape abstractions with invertible neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3204–3215.
  64. Superquadrics Revisited: Learning 3D Shape Parsing Beyond Cuboids. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019. Computer Vision Foundation / IEEE, 10344–10353.
  65. ANISE: Assembly-based Neural Implicit Surface rEconstruction. IEEE Transactions on Visualization and Computer Graphics (2023).
  66. Dreamfusion: Text-to-3d using 2d diffusion. arXiv preprint arXiv:2209.14988 (2022).
  67. Magic123: One image to high-quality 3d object generation using both 2d and 3d diffusion priors. arXiv preprint arXiv:2306.17843 (2023).
  68. Learning transferable visual models from natural language supervision. In International conference on machine learning. PMLR, 8748–8763.
  69. Dreambooth3d: Subject-driven text-to-3d generation. arXiv preprint arXiv:2303.13508 (2023).
  70. Part-based mesh segmentation: a survey. In Computer Graphics Forum, Vol. 37. Wiley Online Library, 235–274.
  71. Multi-chart geometry images. (2003).
  72. Consistent mesh partitioning and skeletonisation using the shape diameter function. The Visual Computer 24 (2008), 249–259.
  73. Metamorphosis of polyhedral surfaces using decomposition. In Computer graphics forum, Vol. 21. Wiley Online Library, 219–228.
  74. Unsupervised 3D shape segmentation and co-segmentation via deep learning. Comput. Aided Geom. Des. 43 (2016), 39–52.
  75. Deep unsupervised learning using nonequilibrium thermodynamics. In International conference on machine learning. PMLR, 2256–2265.
  76. Robert W Sumner and Jovan Popović. 2004. Deformation transfer for triangle meshes. ACM Transactions on graphics (TOG) 23, 3 (2004), 399–405.
  77. Chun-Yu Sun and Qian-Fang Zou. 2019. Learning adaptive hierarchical cuboid abstractions of 3D shape collections. ACM Trans. Graph. 38, 6 (2019), 241:1–241:13.
  78. Viewset Diffusion:(0-) Image-Conditioned 3D Generative Models from 2D Data. arXiv preprint arXiv:2306.07881 (2023).
  79. Openmask3d: Open-vocabulary 3d instance segmentation. arXiv preprint arXiv:2306.13631 (2023).
  80. Make-it-3d: High-fidelity 3d creation from a single image with diffusion prior. arXiv preprint arXiv:2303.14184 (2023).
  81. Human motion diffusion model. arXiv preprint arXiv:2209.14916 (2022).
  82. Diffusion with Forward Models: Solving Stochastic Inverse Problems Without Direct Supervision. arXiv preprint arXiv:2306.11719 (2023).
  83. Learning Shape Abstractions by Assembling Volumetric Primitives. In 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017. IEEE Computer Society, 1466–1474.
  84. LION: Latent Point Diffusion Models for 3D Shape Generation. Advances in Neural Information Processing Systems 35 (2022), 10021–10039.
  85. Score jacobian chaining: Lifting pretrained 2d diffusion models for 3d generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12619–12629.
  86. NeuS: Learning Neural Implicit Surfaces by Volume Rendering for Multi-view Reconstruction. NeurIPS (2021).
  87. Pq-net: A generative part seq2seq network for 3d shapes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 829–838.
  88. Sagnet: Structure-aware generative network for 3d-shape modeling. ACM Transactions on Graphics (TOG) 38, 4 (2019), 1–14.
  89. 3D-aware Image Generation using 2D Diffusion Models. arXiv preprint arXiv:2303.17905 (2023).
  90. Carve3D: Improving Multi-view Reconstruction Consistency for Diffusion Models with RL Finetuning. arXiv preprint arXiv:2312.13980 (2023).
  91. NeuralLift-360: Lifting an In-the-Wild 2D Photo to a 3D Object With 360deg Views. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4479–4489.
  92. Directionally Convolutional Networks for 3D Shape Segmentation. In IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017. IEEE Computer Society, 2717–2726.
  93. SAM3D: Segment Anything in 3D Scenes. arXiv preprint arXiv:2306.03908 (2023).
  94. Space-time co-segmentation of articulated point cloud sequences. In Computer Graphics Forum, Vol. 35. Wiley Online Library, 419–429.
  95. Guided Mesh Normal Filtering. Comput. Graph. Forum 34, 7 (2015), 23–34.
  96. Locally attentional sdf diffusion for controllable 3d shape generation. arXiv preprint arXiv:2305.04461 (2023).
  97. Bilateral Normal Filtering for Mesh Denoising. IEEE Trans. Vis. Comput. Graph. 17, 10 (2011), 1521–1530.
  98. Neural volumetric mesh generator. arXiv preprint arXiv:2210.03158 (2022).
  99. EMDM: Efficient Motion Diffusion Model for Fast, High-Quality Motion Generation. arXiv preprint arXiv:2312.02256 (2023).
  100. Polyhedral surface decomposition with applications. Computers & Graphics 26, 5 (2002), 733–743.
Citations (5)

Summary

  • The paper introduces a novel framework that integrates diffusion models, SAM, and contrastive learning for enhanced single-view 3D reconstruction.
  • It employs graph-based segmentation and SyncDreamer-generated multiview images to capture intricate object parts from limited input.
  • The approach achieves competitive results on the Google Scanned Object dataset, enabling precise segmentation-driven shape editing and feature preservation.

Insightful Overview of "Part123: Part-aware 3D Reconstruction from a Single-view Image"

The paper "Part123: Part-aware 3D Reconstruction from a Single-view Image" introduces a novel framework that advances the capabilities of single-view 3D reconstruction by incorporating part-aware segmentation into the pipeline. This research provides a new methodology for obtaining structural segmentation in 3D models, addressing significant challenges faced by existing methods which typically overlook the structural decomposition of objects.

Technical Contributions

The proposed approach, Part123, leverages the power of diffusion models combined with the Segment Anything Model (SAM) to generate multiview images and corresponding segmentation masks from a single input image. The primary innovation lies in the integration of part-aware learning into the reconstruction process via contrastive learning. This is embedded within a neural rendering framework, NeuS, to jointly optimize for both geometry and part-aware feature fields. Several key components of this framework and its methodology stand out:

  1. Multiview Image Generation: Using SyncDreamer for diffusing multiview-consistent images lays the groundwork for reconstructing accurate 3D geometry from limited data input.
  2. 2D Segmentation Integration: By employing SAM, the approach benefits from a robust and generalizable model that can generate segmentation masks even for complex and arbitrary objects, underpinning the part-aware aspect of the framework.
  3. Contrastive Learning for Part-awareness: This element distinguishes the feature space of 3D points by optimizing semantic consistency based on the 2D segmentation masks, effectively facilitating the lift from 2D to 3D parts.
  4. Automatic Part Segmentation: The framework introduces a graph-based algorithm to estimate part numbers automatically, a non-trivial task crucial for accurate 3D part segmentation. It evaluates correspondences between multi-view segmentations to robustly determine the number of parts.

Implications and Applications

The implications of this research are both theoretical and practical. Theoretically, it sheds light on how 2D segmentation concepts can be adapted for 3D models without the need for extensive 3D annotations. Practically, the resulting part-aware models have a diverse array of potential applications:

  • Feature-Preserving Reconstruction: The use of segmented parts strengthens applications that require the preservation of sharp geometrical features during model smoothing.
  • Primitive Fitting: The segmentation allows for efficient high-level abstraction of shapes through primitive fitting, a critical process in applications such as shape modeling.
  • Shape Editing: The framework facilitates sophisticated editing tasks, where components of 3D models can be replaced or articulated independently, enhancing user control and customization options in graphical applications.

Experimentation and Results

The effectiveness of Part123 is demonstrated through comprehensive experimentation on the Google Scanned Object dataset, showing competitive performance against existing reconstruction methods like SyncDreamer, while adding the novel capability of part segmentation. The user paper further confirmed that the segmentation aligns well with human perception, reinforcing the practical viability of the method.

Future Prospects

This work opens avenues for extending part-aware concepts to other domains within AI, such as autonomous robotics where object manipulation requires understanding of part composition. Moreover, as diffusion models evolve, integrating such frameworks with end-to-end generative models may enhance the fidelity and applicability of part-aware reconstructions. Continued research might also explore the adaptation of these techniques to incorporate semantic understanding, potentially supported by even broader datasets in multiple languages to improve the robustness of the contrastive learning approach.

In summary, Part123 represents a significant advancement in the field of 3D reconstruction, offering insights into integrating part-awareness comprehensively and automatically, paving the way for enhanced application in various AI-driven domains.

X Twitter Logo Streamline Icon: https://streamlinehq.com