MeshLRM: Large Reconstruction Model for High-Quality Mesh (2404.12385v1)

Published 18 Apr 2024 in cs.CV and cs.GR

Abstract: We propose MeshLRM, a novel LRM-based approach that can reconstruct a high-quality mesh from merely four input images in less than one second. Different from previous large reconstruction models (LRMs) that focus on NeRF-based reconstruction, MeshLRM incorporates differentiable mesh extraction and rendering within the LRM framework. This allows for end-to-end mesh reconstruction by fine-tuning a pre-trained NeRF LRM with mesh rendering. Moreover, we improve the LRM architecture by simplifying several complex designs in previous LRMs. MeshLRM's NeRF initialization is sequentially trained with low- and high-resolution images; this new LRM training strategy enables significantly faster convergence and thereby leads to better quality with less compute. Our approach achieves state-of-the-art mesh reconstruction from sparse-view inputs and also allows for many downstream applications, including text-to-3D and single-image-to-3D generation. Project page: https://sarahweiii.github.io/meshlrm/

Citations (45)

View on Semantic Scholar

Summary

The paper introduces MeshLRM, a novel model that integrates differentiable mesh extraction and a two-stage training process to generate high-quality 3D reconstructions from sparse views.
It employs innovative techniques like Differentiable Marching Cubes, rasterization, and ray opacity loss to enhance texture fidelity and geometric detail.
The streamlined, transformer-based architecture reduces computational demand, enabling immediate use in applications such as text-to-3D and single-image-to-3D generation.

MeshLRM: A Large Reconstruction Model for High-Quality Mesh Generation from Sparse-View Inputs

Introduction to MeshLRM

MeshLRM introduces a novel framework leveraging the robust capabilities of Large Reconstruction Models (LRMs) specifically tailored for high-quality mesh reconstruction from sparse-view inputs. Unlike traditional approaches necessitating extensive input data and processing time, MeshLRM efficiently produces detailed 3D meshes suitable for immediate use in various downstream applications, including text-to-3D and single-image-to-3D generation.

Key Contributions

Integrated Differentiable Mesh Extraction: MeshLRM is designed from the ground up to support end-to-end training and mesh extraction. By integrating differentiable surface extraction directly into the LRM framework, the model optimizes for surface detail and fidelity directly, rather than through post-processing steps which can degrade quality.
Innovative Training Approach: The model training utilizes a novel strategy, beginning with low-resolution initial training followed by high-resolution fine-tuning. This method allows for faster convergence and reduced computational demand.
Simplified Architecture: The architecture of MeshLRM simplifies many aspects of traditional LRMs, dispensing with complex pre-trained modules in favor of a more streamlined transformer-based setup that processes image and triplane tokens efficiently.

Combination of Techniques

MeshLRM innovatively merges various techniques:

Differentiable Marching Cubes (DiffMC): In contrast to traditional NeRF to mesh conversions, MeshLRM uses DiffMC to directly extract mesh surfaces from a learned density field, optimizing the entire process toward preserving high-resolution details.
Differentiable Rasterization: Used during the second stage of training to render the extracted meshes, enabling the model to fine-tune based on realistic lighting and view-dependent effects.
Ray Opacity Loss: A novel loss function ensuring that empty spaces in the 3D field maintain near-zero densities, improving model stability and accuracy in reconstructed volumes.

Experimental Setup and Results

MeshLRM was rigorously tested against various benchmarks, demonstrating superior performance in both synthetic and real-world scenarios. The model consistently achieved high fidelity in texture and geometry compared to existing solutions, underpinned by its efficient training and inference capabilities. The unique two-stage training process of MeshLRM allows it to perform exceptionally well in sparse-view challenges where previous models have faltered.

Future Outlook and Applications

The scalability and efficiency of MeshLRM potentially make it a cornerstone for future developments in automated 3D content creation. Its ability to generate high-quality meshes from minimal inputs paves the way for innovative applications in virtual reality, gaming, and beyond. The integration capability with other generative models also suggests that MeshLRM could serve as a foundational technology for emergent AI-driven design tools.

In summary, MeshLRM sets a new standard for mesh reconstruction quality and operational efficiency, making high-fidelity 3D asset creation more accessible and less resource-intensive.

PDF Markdown

Related Papers

Tweets

https://twitter.com/_akhaliq/status/1781181414186103140

https://twitter.com/arxivsanitybot/status/1781675866298396684

https://twitter.com/amoufarek/status/1784576091757117708

https://twitter.com/javaeeeee1/status/1781320142850249144

YouTube

Show All Videos