Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MVDD: Multi-View Depth Diffusion Models (2312.04875v3)

Published 8 Dec 2023 in cs.CV

Abstract: Denoising diffusion models have demonstrated outstanding results in 2D image generation, yet it remains a challenge to replicate its success in 3D shape generation. In this paper, we propose leveraging multi-view depth, which represents complex 3D shapes in a 2D data format that is easy to denoise. We pair this representation with a diffusion model, MVDD, that is capable of generating high-quality dense point clouds with 20K+ points with fine-grained details. To enforce 3D consistency in multi-view depth, we introduce an epipolar line segment attention that conditions the denoising step for a view on its neighboring views. Additionally, a depth fusion module is incorporated into diffusion steps to further ensure the alignment of depth maps. When augmented with surface reconstruction, MVDD can also produce high-quality 3D meshes. Furthermore, MVDD stands out in other tasks such as depth completion, and can serve as a 3D prior, significantly boosting many downstream tasks, such as GAN inversion. State-of-the-art results from extensive experiments demonstrate MVDD's excellent ability in 3D shape generation, depth completion, and its potential as a 3D prior for downstream tasks.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Zhen Wang (571 papers)
  2. Qiangeng Xu (20 papers)
  3. Feitong Tan (14 papers)
  4. Menglei Chai (37 papers)
  5. Shichen Liu (21 papers)
  6. Rohit Pandey (31 papers)
  7. Sean Fanello (27 papers)
  8. Achuta Kadambi (36 papers)
  9. Yinda Zhang (68 papers)
Citations (2)

Summary

  • The paper introduces a diffusion model that leverages multi-view depth to generate consistent 3D shapes while reducing computational load.
  • It integrates epipolar line segment attention and a depth fusion module to preserve cross-view consistency and fine-grained details.
  • Comprehensive experiments demonstrate MVDD's superior performance in 3D shape generation, completion, and as a robust 3D prior for tasks like GAN inversion.

Multi-View Depth Diffusion Models

Overview

Multi-View Depth Diffusion Models (MVDD) have marked a significant stride forward in 3D shape generation. Traditional denoising diffusion models excel in 2D image generation, yet extending this success into the 3D domain has been challenging. MVDD tackles this by using a multi-view depth to represent complex 3D shapes in a 2D format, which simplifies the denoising process. This representation enables the generation of dense point clouds carrying highly detailed structures. The MVDD approach ensures 3D consistency across different views using a novel attention mechanism based on epipolar geometry, along with a depth fusion module to align depth maps during the diffusion process.

Approach and Features

MVDD combines multi-view representation with diffusion models to create high-quality 3D shapes. It utilizes an epipolar line segment attention to keep the diffusion process aware of the depth in neighboring views. In practical terms, the model doesn't attend to the entire epipolar line but focuses on line segments around the back-projected depth. This targeted attention reduces computational load and increases relevance, making the approach both efficient and effective.

A depth fusion module integrated into the denoising process ensures the alignment of depth maps when viewing from different angles. The result is a consistent 3D shape when back-projected from multiple depth maps. Additionally, MVDD can also serve as a 3D prior, aiding downstream tasks like GAN inversion. This is significant for improving results across various applications.

Contribution and Results

MVDD's key contributions lie in its multi-view depth representation within a generative setting, backed by a diffusion model. This choice of representation scales well for diffusion frameworks and simplifies data format handling. The model enforces cross-view consistency through epipolar line segment attention and depth fusion. Results from comprehensive experiments demonstrate MVDD's leading performance in tasks such as 3D shape generation and shape completion.

MVDD significantly outperforms prior art in both qualitative and quantitative evaluations. It generates fine-grained 3D shapes and effectively uses partially observed data to complete shapes. Moreover, it serves as a robust 3D prior for complex tasks, including 3D GAN inversion, where it enhances the reconstruction of shapes from novel viewpoints.

Conclusion

MVDD represents a considerable advancement in 3D shape generation technologies. Its ability to handle intricate details and ensure consistent 3D shapes across multi-view depth is notable. MVDD's architecture and approach present scalable, faithful, and versatile solutions for 3D generative modeling, benefiting a broad range of applications and setting the stage for ongoing innovations in the field.