Emergent Mind

MeshAnything: Artist-Created Mesh Generation with Autoregressive Transformers

(2406.10163)
Published Jun 14, 2024 in cs.CV and cs.AI

Abstract

Recently, 3D assets created via reconstruction and generation have matched the quality of manually crafted assets, highlighting their potential for replacement. However, this potential is largely unrealized because these assets always need to be converted to meshes for 3D industry applications, and the meshes produced by current mesh extraction methods are significantly inferior to Artist-Created Meshes (AMs), i.e., meshes created by human artists. Specifically, current mesh extraction methods rely on dense faces and ignore geometric features, leading to inefficiencies, complicated post-processing, and lower representation quality. To address these issues, we introduce MeshAnything, a model that treats mesh extraction as a generation problem, producing AMs aligned with specified shapes. By converting 3D assets in any 3D representation into AMs, MeshAnything can be integrated with various 3D asset production methods, thereby enhancing their application across the 3D industry. The architecture of MeshAnything comprises a VQ-VAE and a shape-conditioned decoder-only transformer. We first learn a mesh vocabulary using the VQ-VAE, then train the shape-conditioned decoder-only transformer on this vocabulary for shape-conditioned autoregressive mesh generation. Our extensive experiments show that our method generates AMs with hundreds of times fewer faces, significantly improving storage, rendering, and simulation efficiencies, while achieving precision comparable to previous methods.

MeshAnything: Autoregressive transformer generating meshes from 3D shapes via encoded point cloud features.

Overview

  • The paper introduces MeshAnything, a method for generating artist-created meshes from various 3D representations using shape-conditioned autoregressive transformers.

  • MeshAnything combines a Vector Quantized Variational Autoencoder (VQ-VAE) with a shape-conditioned decoder-only transformer to improve mesh generation's quality and efficiency.

  • Experimental results show that MeshAnything significantly reduces the number of faces in meshes while maintaining high quality, with enhancements like a noise-resistant decoder improving overall robustness.

Overview of "MeshAnything: Artist-Created Mesh Generation with Autoregressive Transformers"

"MeshAnything" addresses a critical bottleneck in the 3D industry by presenting a method to generate Artist-Created Meshes (AMs) from various 3D representations using shape-conditioned autoregressive transformers. This paper introduces a novel perspective by treating mesh extraction as a generation problem rather than a reconstruction one, facilitating the replacement of manually crafted 3D assets with automatically generated ones.

Key Contributions

  1. Shape-Conditioned AM Generation: The paper proposes a pioneering strategy of Shape-Conditioned AM Generation, emphasizing the creation of meshes that mimic those produced by human artists. Previous methods focused on reconstruction-based mesh extraction, leading to inefficiencies due to dense meshes with poor topology.

  2. MeshAnything Framework: MeshAnything combines a Vector Quantized Variational Autoencoder (VQ-VAE) with a shape-conditioned decoder-only transformer. This hybrid architecture first learns a mesh vocabulary using the VQ-VAE and subsequently trains the transformer for shape-conditioned autoregressive mesh generation.

  3. Noise-Resistant Decoder: To enhance mesh generation quality, the paper introduces a noise-resistant decoder that incorporates shape conditions, aiming to robustly decode even poorly predicted token sequences by the transformer.

Methodological Innovations

Data Preparation and Shape Encoding

  • The authors leverage point clouds as the shape condition representation due to their continuous and explicit nature, facilitating easy conversion from various 3D representations.
  • Meshes are carefully paired with shape conditions created by sampling point clouds from ground truth meshes with intentional quality reduction to mimic real-world application scenarios.

VQ-VAE for Mesh Vocabulary Learning

Shape-Conditioned Autoregressive Transformer

  • The transformer is augmented with shape condition tokens derived from an encoder-pretrained on point clouds. This integration enables the autoregressive model to generate meshes that adhere closely to the provided shapes.

Experimental Validation

Qualitative Performance

  • MeshAnything demonstrates the ability to generate AMs that significantly reduce the number of faces and vertices while maintaining high-quality shape alignment, topology, and geometric feature representation.

Quantitative Results

  • Extensive experiments show that MeshAnything generates meshes with hundreds of times fewer faces compared to traditional methods like Marching Cubes and Remesh, while achieving competitive precision in metrics such as Chamfer Distance (CD) and Edge Chamfer Distance (ECD).
  • The noise-resistant decoder notably improves the model's robustness to lower-quality token sequences, enhancing overall generated mesh quality.

Implications and Future Directions

Practical Applications

The practical implications of this research are profound, as MeshAnything enables the efficient generation of high-quality 3D assets for the gaming, film, and burgeoning metaverse industries. By aligning generated meshes to the quality of artist-created assets, this method promises to significantly reduce the labor costs and time associated with 3D model production.

Theoretical Impact and Future Research

The approach of treating mesh extraction as a generation problem opens new avenues for research in 3D asset production. Future work may explore expanding the scalability of MeshAnything to handle large-scale scenes and more complex objects. Additionally, further improvements in model stability and robustness will be essential to transition from theoretical advancements to widespread application.

In conclusion, the MeshAnything framework presents a significant advancement in the field of 3D asset production, offering practical solutions for integrating automatically generated meshes into industrial pipelines. By addressing the inefficiencies inherent in previous methods and proposing innovative architectural solutions, this research lays groundwork for future developments in automated 3D modeling.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.

YouTube