LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation (2402.05054v1)

Published 7 Feb 2024 in cs.CV

Abstract: 3D content creation has achieved significant progress in terms of both quality and speed. Although current feed-forward models can produce 3D objects in seconds, their resolution is constrained by the intensive computation required during training. In this paper, we introduce Large Multi-View Gaussian Model (LGM), a novel framework designed to generate high-resolution 3D models from text prompts or single-view images. Our key insights are two-fold: 1) 3D Representation: We propose multi-view Gaussian features as an efficient yet powerful representation, which can then be fused together for differentiable rendering. 2) 3D Backbone: We present an asymmetric U-Net as a high-throughput backbone operating on multi-view images, which can be produced from text or single-view image input by leveraging multi-view diffusion models. Extensive experiments demonstrate the high fidelity and efficiency of our approach. Notably, we maintain the fast speed to generate 3D objects within 5 seconds while boosting the training resolution to 512, thereby achieving high-resolution 3D content generation.

Citations (208)

View on Semantic Scholar

Summary

The paper presents a novel multi-view Gaussian feature representation that speeds up high-resolution 3D content creation.
The paper employs an asymmetric U-Net backbone to fuse multi-view data efficiently, enhancing detail and fidelity in generated models.
The paper introduces a mesh extraction algorithm that converts Gaussian splats to polygonal meshes, facilitating applications in VR, gaming, and animation.

The query you performed corresponds to the research paper titled "LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation" (arXiv ID: (LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation, 7 Feb 2024)). This paper introduces a framework called the Large Multi-View Gaussian Model (LGM) designed for generating high-resolution 3D content from text prompts or single-view images. Here is a comprehensive overview of the paper's key concepts and findings:

Motivation and Challenges:
- The paper addresses the limitations in existing 3D content creation techniques, which typically suffer from resolution constraints and computational inefficiencies.
- Previous methods, like triplane-based neural radiance fields (NeRF), are limited in terms of resolution due to computationally expensive operations and complex backbone architectures.
Proposed Methodology:
- Multi-View Gaussian Features: LGM introduces multi-view Gaussian features as an efficient 3D representation. This approach allows high-resolution 3D models to be generated swiftly (in about 5 seconds) by fusing features from multiple views.
- Asymmetric U-Net Backbone: The paper proposes an U-Net based architecture, which processes multi-view images to predict Gaussian features. This configuration facilitates end-to-end learning and improves throughput without relying heavily on transformers.
- Training Enhancements: The authors utilize specific data augmentation strategies to ensure the model is robust across different input conditions, simulating inconsistencies in multi-view images generated by off-the-shelf models.
Experiments and Results:
- LGM demonstrates state-of-the-art performance in high-resolution 3D model generation with an ability to handle both image-to-3D and text-to-3D tasks.
- Comparative studies highlight LGM’s superiority in producing detailed and high-fidelity representations compared to other methods, especially in handling challenging and diverse inputs.
Mesh Extraction Algorithm:
- The authors present a novel algorithm to convert generated 3D Gaussian splats to polygonal meshes. This conversion process accommodates rendering requirements needed for various downstream applications.
Applications and Implications:
- The framework shows promise for efficient, high-resolution 3D content creation, applicable in fields such as gaming, virtual reality, and animation.
- By optimizing both the representation and the learning framework, the method achieves a significant speed-up in content creation while maintaining or even enhancing quality.

In summary, the paper contributes a significant advancement in the domain of automatic 3D content generation, balancing efficiency, speed, and fidelity. Its novel use of Gaussian splatting alongside U-Net architectures represents a practical approach for overcoming existing bottlenecks in 3D model generation technologies.

PDF Markdown

Related Papers

Tweets

https://twitter.com/_akhaliq/status/1755462708105781269

https://twitter.com/taziku_co/status/1844870886748454939

https://twitter.com/janusch_patas/status/1755580082465157585

https://twitter.com/DylanTFWang/status/1756294096250134710

https://twitter.com/Gradio/status/1755544506076655893

https://twitter.com/WilliamLamkin/status/1755576634839605749

YouTube

Show All Videos