Emergent Mind

Abstract

We introduce GRM, a large-scale reconstructor capable of recovering a 3D asset from sparse-view images in around 0.1s. GRM is a feed-forward transformer-based model that efficiently incorporates multi-view information to translate the input pixels into pixel-aligned Gaussians, which are unprojected to create a set of densely distributed 3D Gaussians representing a scene. Together, our transformer architecture and the use of 3D Gaussians unlock a scalable and efficient reconstruction framework. Extensive experimental results demonstrate the superiority of our method over alternatives regarding both reconstruction quality and efficiency. We also showcase the potential of GRM in generative tasks, i.e., text-to-3D and image-to-3D, by integrating it with existing multi-view diffusion models. Our project website is at: https://justimyhxu.github.io/projects/grm/.

Overview

  • GRM is a new approach to 3D reconstruction from sparse-view images using pixel-aligned Gaussians for rapid and efficient process.

  • It leverages a transformer-based architecture for converting pixels to 3D Gaussians, ensuring high-quality and consistent reconstructions.

  • GRM achieves state-of-the-art performance in object-level 3D reconstruction and in generative tasks when combined with multi-view diffusion models.

  • Future work could improve GRM by addressing its limitations in hallucinating unseen regions through probabilistic frameworks.

Exploring Efficient 3D Reconstruction and Generation with GRM: A Large Gaussian Reconstruction Model

Introduction to GRM

The recently introduced Gaussian Reconstruction Model (GRM) presents an innovative approach to reconstructing 3D assets from sparse-view images, significantly reducing the time required for this process to approximately 0.1s. This model leverages a transformer-based architecture to efficiently handle multi-view information, translating input pixels to pixel-aligned Gaussians. These Gaussians are subsequently unprojected to create a densely distributed set of 3D Gaussians that represent the scene. Notable for its scalability and efficiency, GRM demonstrates superior reconstruction quality and efficiency over other methods. Furthermore, its potential extends to generative tasks, including text-to-3D and image-to-3D, by integrating with existing multi-view diffusion models.

Technical Overview

GRM replaces the conventional triplane scene representation with 3D Gaussians, navigating away from inefficient volume rendering. Its architecture encompasses two novel components: a representation through pixel-aligned 3D Gaussians and a purely transformer-based architecture for pixel-to-3D Gaussian conversion. This model captures highly detailed spatial features and encourages consistency across different views, a crucial factor for high-quality reconstruction.

Core Contributions

  • Efficient Framework: GRM introduces a feed-forward 3D generative model focused on 3D Gaussian splatting, enabling rapid and high-quality 3D reconstruction.
  • Transformer-based Sparse-View Reconstructor: A transformer architecture, including an encoder and an innovative upsampler, is employed for efficient pixel-to-3D Gaussian translation.
  • State-of-the-Art Quality and Speed: For object-level 3D reconstruction and when combined with multi-view diffusion models for generative tasks, GRM sets new benchmarks in quality and inference speed.

Empirical Results

Extensive experimental analyses underscore GRM's ability to outperform existing methods significantly. For instance, in the context of sparse-view 3D reconstruction from four images, GRM achieves remarkable improvements in PSNR, SSIM, and LPIPS metrics while maintaining impressive inference speed. Similarly, in the domains of text-to-3D and image-to-3D generation, GRM, coupled with appropriate diffusion models, continues to exhibit superior performance across various quality metrics and user studies.

Future Directions

Despite its achievements, GRM's reliance on input images restricts its capacity for hallucinating unseen regions, leading to potential areas for improvement in future work. The exploration of probabilistic frameworks or the incorporation of hallucinative capabilities could enhance GRM's versatility and reconstruction quality.

Concluding Remarks

The Gaussian Reconstruction Model (GRM) represents a significant step forward in the realm of 3D reconstruction and generation. By efficiently transforming sparse-view images into high-fidelity 3D assets and seamlessly integrating with diffusion models for generative tasks, it opens new avenues in digital content creation. Its exemplary performance, underscored by rigorous experimental validation, showcases the transformative potential of combining advanced neural architectures with 3D Gaussian representations.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.

YouTube