Emergent Mind

Abstract

3D content creation has achieved significant progress in terms of both quality and speed. Although current feed-forward models can produce 3D objects in seconds, their resolution is constrained by the intensive computation required during training. In this paper, we introduce Large Multi-View Gaussian Model (LGM), a novel framework designed to generate high-resolution 3D models from text prompts or single-view images. Our key insights are two-fold: 1) 3D Representation: We propose multi-view Gaussian features as an efficient yet powerful representation, which can then be fused together for differentiable rendering. 2) 3D Backbone: We present an asymmetric U-Net as a high-throughput backbone operating on multi-view images, which can be produced from text or single-view image input by leveraging multi-view diffusion models. Extensive experiments demonstrate the high fidelity and efficiency of our approach. Notably, we maintain the fast speed to generate 3D objects within 5 seconds while boosting the training resolution to 512, thereby achieving high-resolution 3D content generation.

Network architecture adopts U-Net with self-attentions, fusing images into 3D Gaussians for novel view rendering.

Overview

  • Introduces the Large Multi-View Gaussian Model (LGM) for creating high-resolution 3D models from text or images.

  • LGM uses multi-view Gaussian features and an asymmetric U-Net architecture for efficient training and high-resolution output.

  • Demonstrates significant improvements in 3D model fidelity and generation efficiency, reducing creation time to approximately 5 seconds.

  • Opens prospects for future enhancements to input views and potential applications in various fields.

Overview of the Paper

The Large Multi-View Gaussian Model (LGM) introduces a novel approach to creating high-resolution 3D models from text prompts or single-view images. Leveraging a combination of multi-view Gaussian features and an asymmetric U-Net backbone, LGM is designed to efficiently train few-shot 3D reconstruction models while avoiding conventional techniques such as volumetric rendering or transformer models.

Key Innovations

The paper posits that the inefficiencies of prior methods lie in their 3D representation and the heavy parameterization of 3D backbones. Addressing these issues, LGM proposes multi-view Gaussian splatting as an expressive and computationally efficient 3D representation. The asymmetric U-Net operates as a high-throughput backbone, designed to process multi-view images into Gaussian features. This structure supports high-resolution training and circumvents the limitations of previous triplane-based models. Notably, LGM maintains a generation time of 5 seconds for 3D models while handling training resolutions up to 512.

Empirical Results

Extensive experimental evaluations show that LGM significantly enhances the fidelity, resolution, and efficiency of 3D content generation. The paper reports that through its method, high-resolution, richly detailed 3D Gaussians can be produced rapidly (~5 seconds from input), which is impressive compared to existing techniques. Moreover, user studies commend LGM on its superior performance in terms of image consistency and overall model quality, further proving its effectiveness.

Conclusion and Future Work

Concluding the paper, the authors highlight the contributions made by LGM in the field of 3D content creation. It is especially notable for its ability to work with both image-to-3D and text-to-3D conversions, and its robustness training. Nevertheless, the paper does acknowledge its limitations, particularly the dependency on the quality of input views generated by current multi-view diffusion models which could result in inconsistencies. The authors suggest that future work could focus on refining these models to overcome the current limitations. The LGM represents a significant leap forward in high-resolution 3D asset generation, showcasing both versatility and potential for widespread application.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.

YouTube