Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation (2402.05054v1)

Published 7 Feb 2024 in cs.CV

Abstract: 3D content creation has achieved significant progress in terms of both quality and speed. Although current feed-forward models can produce 3D objects in seconds, their resolution is constrained by the intensive computation required during training. In this paper, we introduce Large Multi-View Gaussian Model (LGM), a novel framework designed to generate high-resolution 3D models from text prompts or single-view images. Our key insights are two-fold: 1) 3D Representation: We propose multi-view Gaussian features as an efficient yet powerful representation, which can then be fused together for differentiable rendering. 2) 3D Backbone: We present an asymmetric U-Net as a high-throughput backbone operating on multi-view images, which can be produced from text or single-view image input by leveraging multi-view diffusion models. Extensive experiments demonstrate the high fidelity and efficiency of our approach. Notably, we maintain the fast speed to generate 3D objects within 5 seconds while boosting the training resolution to 512, thereby achieving high-resolution 3D content generation.

Citations (208)

Summary

  • The paper presents a novel multi-view Gaussian feature representation that speeds up high-resolution 3D content creation.
  • The paper employs an asymmetric U-Net backbone to fuse multi-view data efficiently, enhancing detail and fidelity in generated models.
  • The paper introduces a mesh extraction algorithm that converts Gaussian splats to polygonal meshes, facilitating applications in VR, gaming, and animation.

The query you performed corresponds to the research paper titled "LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation" (arXiv ID: (LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation, 7 Feb 2024)). This paper introduces a framework called the Large Multi-View Gaussian Model (LGM) designed for generating high-resolution 3D content from text prompts or single-view images. Here is a comprehensive overview of the paper's key concepts and findings:

  1. Motivation and Challenges:
    • The paper addresses the limitations in existing 3D content creation techniques, which typically suffer from resolution constraints and computational inefficiencies.
    • Previous methods, like triplane-based neural radiance fields (NeRF), are limited in terms of resolution due to computationally expensive operations and complex backbone architectures.
  2. Proposed Methodology:
    • Multi-View Gaussian Features: LGM introduces multi-view Gaussian features as an efficient 3D representation. This approach allows high-resolution 3D models to be generated swiftly (in about 5 seconds) by fusing features from multiple views.
    • Asymmetric U-Net Backbone: The paper proposes an U-Net based architecture, which processes multi-view images to predict Gaussian features. This configuration facilitates end-to-end learning and improves throughput without relying heavily on transformers.
    • Training Enhancements: The authors utilize specific data augmentation strategies to ensure the model is robust across different input conditions, simulating inconsistencies in multi-view images generated by off-the-shelf models.
  3. Experiments and Results:
    • LGM demonstrates state-of-the-art performance in high-resolution 3D model generation with an ability to handle both image-to-3D and text-to-3D tasks.
    • Comparative studies highlight LGM’s superiority in producing detailed and high-fidelity representations compared to other methods, especially in handling challenging and diverse inputs.
  4. Mesh Extraction Algorithm:
    • The authors present a novel algorithm to convert generated 3D Gaussian splats to polygonal meshes. This conversion process accommodates rendering requirements needed for various downstream applications.
  5. Applications and Implications:
    • The framework shows promise for efficient, high-resolution 3D content creation, applicable in fields such as gaming, virtual reality, and animation.
    • By optimizing both the representation and the learning framework, the method achieves a significant speed-up in content creation while maintaining or even enhancing quality.

In summary, the paper contributes a significant advancement in the domain of automatic 3D content generation, balancing efficiency, speed, and fidelity. Its novel use of Gaussian splatting alongside U-Net architectures represents a practical approach for overcoming existing bottlenecks in 3D model generation technologies.

Youtube Logo Streamline Icon: https://streamlinehq.com