Emergent Mind

MeshLRM: Large Reconstruction Model for High-Quality Mesh

(2404.12385)
Published Apr 18, 2024 in cs.CV and cs.GR

Abstract

We propose MeshLRM, a novel LRM-based approach that can reconstruct a high-quality mesh from merely four input images in less than one second. Different from previous large reconstruction models (LRMs) that focus on NeRF-based reconstruction, MeshLRM incorporates differentiable mesh extraction and rendering within the LRM framework. This allows for end-to-end mesh reconstruction by fine-tuning a pre-trained NeRF LRM with mesh rendering. Moreover, we improve the LRM architecture by simplifying several complex designs in previous LRMs. MeshLRM's NeRF initialization is sequentially trained with low- and high-resolution images; this new LRM training strategy enables significantly faster convergence and thereby leads to better quality with less compute. Our approach achieves state-of-the-art mesh reconstruction from sparse-view inputs and also allows for many downstream applications, including text-to-3D and single-image-to-3D generation. Project page: https://sarahweiii.github.io/meshlrm/

Comparison of MeshLRM with other methods, highlighting differences with 'In3D-LRM' from Instant3D.

Overview

  • MeshLRM brings a novel framework focused on high-quality mesh reconstruction from limited inputs, utilizing Large Reconstruction Models efficiently.

  • The model features integrated differentiable mesh extraction enhancing end-to-end training, a simplified transformer-based architecture, and utilizes novel training strategies focusing on low-to-high resolution training progression.

  • Techniques such as Differentiable Marching Cubes, Differentiable Rasterization, and a novel Ray Opacity Loss function are innovatively combined to enhance surface detail, rendering, and volume stability.

  • MeshLRM demonstrated superior performance in generating high-fidelity 3D meshes from sparse views, outstripping traditional models in texture and geometry accuracy.

MeshLRM: A Large Reconstruction Model for High-Quality Mesh Generation from Sparse-View Inputs

Introduction to MeshLRM

MeshLRM introduces a novel framework leveraging the robust capabilities of Large Reconstruction Models (LRMs) specifically tailored for high-quality mesh reconstruction from sparse-view inputs. Unlike traditional approaches necessitating extensive input data and processing time, MeshLRM efficiently produces detailed 3D meshes suitable for immediate use in various downstream applications, including text-to-3D and single-image-to-3D generation.

Key Contributions

  • Integrated Differentiable Mesh Extraction: MeshLRM is designed from the ground up to support end-to-end training and mesh extraction. By integrating differentiable surface extraction directly into the LRM framework, the model optimizes for surface detail and fidelity directly, rather than through post-processing steps which can degrade quality.
  • Innovative Training Approach: The model training utilizes a novel strategy, beginning with low-resolution initial training followed by high-resolution fine-tuning. This method allows for faster convergence and reduced computational demand.
  • Simplified Architecture: The architecture of MeshLRM simplifies many aspects of traditional LRMs, dispensing with complex pre-trained modules in favor of a more streamlined transformer-based setup that processes image and triplane tokens efficiently.

Combination of Techniques

MeshLRM innovatively merges various techniques:

  1. Differentiable Marching Cubes (DiffMC): In contrast to traditional NeRF to mesh conversions, MeshLRM uses DiffMC to directly extract mesh surfaces from a learned density field, optimizing the entire process toward preserving high-resolution details.
  2. Differentiable Rasterization: Used during the second stage of training to render the extracted meshes, enabling the model to fine-tune based on realistic lighting and view-dependent effects.
  3. Ray Opacity Loss: A novel loss function ensuring that empty spaces in the 3D field maintain near-zero densities, improving model stability and accuracy in reconstructed volumes.

Experimental Setup and Results

MeshLRM was rigorously tested against various benchmarks, demonstrating superior performance in both synthetic and real-world scenarios. The model consistently achieved high fidelity in texture and geometry compared to existing solutions, underpinned by its efficient training and inference capabilities. The unique two-stage training process of MeshLRM allows it to perform exceptionally well in sparse-view challenges where previous models have faltered.

Future Outlook and Applications

The scalability and efficiency of MeshLRM potentially make it a cornerstone for future developments in automated 3D content creation. Its ability to generate high-quality meshes from minimal inputs paves the way for innovative applications in virtual reality, gaming, and beyond. The integration capability with other generative models also suggests that MeshLRM could serve as a foundational technology for emergent AI-driven design tools.

In summary, MeshLRM sets a new standard for mesh reconstruction quality and operational efficiency, making high-fidelity 3D asset creation more accessible and less resource-intensive.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.

YouTube