Generic 3D Diffusion Adapter Using Controlled Multi-View Editing (2403.12032v2)
Abstract: Open-domain 3D object synthesis has been lagging behind image synthesis due to limited data and higher computational complexity. To bridge this gap, recent works have investigated multi-view diffusion but often fall short in either 3D consistency, visual quality, or efficiency. This paper proposes MVEdit, which functions as a 3D counterpart of SDEdit, employing ancestral sampling to jointly denoise multi-view images and output high-quality textured meshes. Built on off-the-shelf 2D diffusion models, MVEdit achieves 3D consistency through a training-free 3D Adapter, which lifts the 2D views of the last timestep into a coherent 3D representation, then conditions the 2D views of the next timestep using rendered views, without uncompromising visual quality. With an inference time of only 2-5 minutes, this framework achieves better trade-off between quality and speed than score distillation. MVEdit is highly versatile and extendable, with a wide range of applications including text/image-to-3D generation, 3D-to-3D editing, and high-quality texture synthesis. In particular, evaluations demonstrate state-of-the-art performance in both image-to-3D and text-guided texture generation tasks. Additionally, we introduce a method for fine-tuning 2D latent diffusion models on small 3D datasets with limited resources, enabling fast low-resolution text-to-3D initialization.
- Cross-Image Attention for Zero-Shot Appearance Transfer. arXiv:2311.03335Â [cs.CV]
- RenderDiffusion: Image Diffusion for 3D Reconstruction, Inpainting and Generation. In CVPR.
- GAUDI: A Neural Architect for Immersive 3D Scene Generation. In NeurIPS.
- InstructPix2Pix: Learning to Follow Image Editing Instructions. In CVPR.
- TexFusion: Synthesizing 3D Textures with Text-Guided Image Diffusion Models. In ICCV.
- Efficient Geometry-aware 3D Generative Adversarial Networks. In CVPR.
- GeNVS: Generative Novel View Synthesis with 3D-Aware Diffusion Models. In ICCV.
- ShapeNet: An Information-Rich 3D Model Repository. Technical Report arXiv:1512.03012 [cs.GR]. Stanford University — Princeton University — Toyota Technological Institute at Chicago.
- Text2Tex: Text-driven Texture Synthesis via Diffusion Models. In ICCV.
- Single-Stage Diffusion NeRF: A Unified Approach to 3D Generation and Reconstruction. In ICCV.
- Fantasia3D: Disentangling Geometry and Appearance for High-quality Text-to-3D Content Creation. In ICCV.
- Objaverse: A Universe of Annotated 3D Objects. In CVPR.
- Google scanned objects: A high-quality dataset of 3d scanned household items. In ICRA. 2553–2560.
- From data to functa: Your data point is a function and you can treat it like one. In ICML.
- Omnidata: A Scalable Pipeline for Making Multi-Task Mid-Level Vision Datasets From 3D Scans. In ICCV. 10786–10796.
- Arpad E Elo. 1967. The proposed uscf rating system, its development, theory, and applications. Chess Life 22, 8 (1967), 242–247.
- NerfDiff: Single-image View Synthesis with NeRF-guided Distillation from 3D-aware Diffusion. In ICML.
- 3DGen: Triplane Latent Diffusion for Textured Mesh Generation. arXiv:2303.05371Â [cs.CV]
- Instruct-NeRF2NeRF: Editing 3D Scenes with Instructions. In ICCV.
- GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium. In NeurIPS.
- Denoising Diffusion Probabilistic Models. In NeurIPS.
- Jonathan Ho and Tim Salimans. 2021. Classifier-Free Diffusion Guidance. In NeurIPS Workshop.
- LoRA: Low-Rank Adaptation of Large Language Models. In ICLR. https://openreview.net/forum?id=nZeVKeeFYf9
- Zero-Shot Text-Guided Object Generation with Dream Fields.
- InfoNeRF: Ray Entropy Minimization for Few-Shot Neural Volume Rendering. In CVPR.
- Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In ICLR.
- TRACER: Extreme Attention Guided Salient Object Tracing Network. In AAAI.
- BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation. In ICML.
- Magic3D: High-Resolution Text-to-3D Content Creation. In CVPR.
- One-2-3-45++: Fast Single Image to 3D Objects with Consistent Multi-View Generation and 3D Diffusion. In CVPR.
- One-2-3-45: Any single image to 3d mesh in 45 seconds without per-shape optimization. In NeurIPS.
- Zero-1-to-3: Zero-shot One Image to 3D Object. In ICCV.
- SyncDreamer: Generating Multiview-consistent Images from a Single-view Image. In ICLR.
- Wonder3D: Single Image to 3D using Cross-Domain Diffusion. In CVPR.
- DPM-Solver: A Fast ODE Solver for Diffusion Probabilistic Model Sampling in Around 10 Steps. In NeurIPS.
- Repaint: Inpainting using denoising diffusion probabilistic models. In CVPR. 11461–11471.
- SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations. In ICLR.
- Latent-NeRF for Shape-Guided Generation of 3D Shapes and Textures. In CVPR.
- NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. In ECCV.
- DiffRF: Rendering-Guided 3D Radiance Field Diffusion. In CVPR.
- Instant Neural Graphics Primitives with a Multiresolution Hash Encoding. ACM Transactions on Graphics 41, 4, Article 102 (July 2022), 15Â pages. https://doi.org/10.1145/3528223.3530127
- OpenAI. 2023. GPT-4 Technical Report. arXiv:2303.08774Â [cs.CL]
- Compositional 3D Scene Generation using Locally Conditioned Diffusion. In 3DV.
- DreamFusion: Text-to-3D using 2D Diffusion. In ICLR.
- Magic123: One Image to High-Quality 3D Object Generation Using Both 2D and 3D Diffusion Priors. In ICLR.
- Learning transferable visual models from natural language supervision. In ICML. 8748–8763.
- Texture: Text-guided texturing of 3d shapes. In SIGGRAPH.
- High-Resolution Image Synthesis with Latent Diffusion Models. In CVPR.
- LAION-5B: An open large-scale dataset for training next generation image-text models. In NeurIPS Workshop.
- Deep Marching Tetrahedra: a Hybrid Representation for High-Resolution 3D Shape Synthesis. In NeurIPS.
- Zero123++: a Single Image to Consistent Multi-view Diffusion Base Model. arXiv:2310.15110
- MVDream: Multi-view Diffusion for 3D Generation. In ICLR.
- 3D Neural Field Generation using Triplane Diffusion. In CVPR.
- Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations. In NeurIPS.
- Score-Based Generative Modeling through Stochastic Differential Equations. In ICLR.
- Laplacian Surface Editing. In Proceedings of the 2004 Eurographics/ACM SIGGRAPH Symposium on Geometry Processing (Nice, France) (SGP ’04). Association for Computing Machinery, New York, NY, USA, 175–184. https://doi.org/10.1145/1057432.1057456
- Dreamcraft3d: Hierarchical 3d generation with bootstrapped diffusion prior. In ICLR.
- DreamGaussian: Generative Gaussian Splatting for Efficient 3D Content Creation. In ICLR.
- Diffusion with Forward Models: Solving Stochastic Inverse Problems Without Direct Supervision. In NeurIPS.
- NeuS: Learning Neural Implicit Surfaces by Volume Rendering for Multi-view Reconstruction. In NeurIPS. 27171–27183.
- Rodin: A Generative Model for Sculpting 3D Digital Avatars Using Diffusion. In CVPR.
- Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure Synthetic Data. In ICCV Workshop.
- ProlificDreamer: High-Fidelity and Diverse Text-to-3D Generation with Variational Score Distillation. In NeurIPS.
- Novel View Synthesis with Diffusion Models. In ICLR.
- GPT-4V(ision) is a Human-Aligned Evaluator for Text-to-3D Generation. In CVPR.
- DMV3D: Denoising Multi-View Diffusion using 3D Large Reconstruction Model. In ICLR.
- IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models. arXiv:2308.06721
- Adding Conditional Control to Text-to-Image Diffusion Models. In ICCV.
- The Unreasonable Effectiveness of Deep Features as a Perceptual Metric. In CVPR.
- Locally Attentional SDF Diffusion for Controllable 3D Shape Generation. ACM Transactions on Graphics 42, 4 (2023).
- Zhizhuo Zhou and Shubham Tulsiani. 2023. SparseFusion: Distilling View-conditioned Diffusion for 3D Reconstruction. In CVPR.
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.