Emergent Mind

MeshGPT: Generating Triangle Meshes with Decoder-Only Transformers

(2311.15475)
Published Nov 27, 2023 in cs.CV and cs.LG

Abstract

We introduce MeshGPT, a new approach for generating triangle meshes that reflects the compactness typical of artist-created meshes, in contrast to dense triangle meshes extracted by iso-surfacing methods from neural fields. Inspired by recent advances in powerful LLMs, we adopt a sequence-based approach to autoregressively generate triangle meshes as sequences of triangles. We first learn a vocabulary of latent quantized embeddings, using graph convolutions, which inform these embeddings of the local mesh geometry and topology. These embeddings are sequenced and decoded into triangles by a decoder, ensuring that they can effectively reconstruct the mesh. A transformer is then trained on this learned vocabulary to predict the index of the next embedding given previous embeddings. Once trained, our model can be autoregressively sampled to generate new triangle meshes, directly generating compact meshes with sharp edges, more closely imitating the efficient triangulation patterns of human-crafted meshes. MeshGPT demonstrates a notable improvement over state of the art mesh generation methods, with a 9% increase in shape coverage and a 30-point enhancement in FID scores across various categories.

A transformer generates mesh sequences from a codebook, optimized by a graph encoder and cross-entropy loss.

Overview

  • MeshGPT introduces a transformer-based method to generate detailed triangle meshes, combining the best of manual and algorithmic modeling.

  • The system uses a graph convolutional encoder and residual quantization to create a vocabulary of latent geometric embeddings.

  • MeshGPT's autoregressive transformer model then decodes these embeddings to progressively generate high-fidelity 3D meshes.

  • In tests with the ShapeNet dataset, MeshGPT outperformed existing mesh generation methods in shape coverage and fidelity.

  • MeshGPT signifies a harmonious blend of ML and craftsmanship, automating 3D content creation without sacrificing quality.

Generating 3D Triangle Meshes with MeshGPT: A Transformer-Based Approach

In the realm of computer graphics and 3D modeling, triangle meshes are foundational elements used to create detailed and intricate virtual shapes. These meshes are essential in applications ranging from video games and movies to scientific visualization and virtual reality. Traditional methods for creating these meshes often involve either artist-driven manual modeling or algorithmic generation, which can result in either highly detailed yet resource-intensive creations or efficient but less detail-oriented models.

The novel method introduced by MeshGPT seeks to harmonize these two approaches by using a transformer-based architecture to generate triangle meshes that are both compact and rich in geometric detail. Inspired by breakthroughs in LLMs, MeshGPT is a decoder-only transformer model trained to autoregressively produce sequences of triangular faces that constitute a 3D mesh.

To begin, MeshGPT learns a vocabulary of latent geometric embeddings using graph convolutions, these embeddings are informed by the local geometry and topology of existing mesh structures. By tokenizing these embeddings, it becomes possible to reconstruct intricate mesh details through a decoding process.

Once equipped with its geometric vocabulary, MeshGPT's transformer model can predict the next token (embedding index) in a sequence, building upon previously generated tokens. This autoregressive sampling effectively generates new meshes that are not only detailed but also possess definition and sharpness akin to manual artistic modeling.

In comparison to alternative methods such as point clouds or neural field-based techniques that require post-processing for utilization, MeshGPT intertwines the generation process more directly with the type of triangle mesh outputs desired in the industry. The model outperforms existing methods in terms of shape coverage and fidelity—critical aspects of any 3D shape generation technique.

MeshGPT's method demonstrates a significant improvement over state-of-the-art mesh generation techniques, showing a boost in shape coverage by an average of 9% and enhanced Fréchet Inception Distance (FID) scores by 30 points across various categories.

To create the variety of embeddings used for mesh generation, the researchers employ a graph convolutional encoder to extract geometric features that are subsequently quantized using what is called residual quantization. The quantized features inform the decoder to produce vertex coordinates for the mesh triangles. The GPT-style transformer, trained with these embeddings, then enables the decoded geometry to represent a mesh.

The researchers conducted extensive experiments with the ShapeNet dataset, utilizing known metrics such as Minimum Matching Distance (MMD), Coverage (COV), and 1-Nearest-Neighbor Accuracy (1-NNA) to demonstrate the effectiveness of MeshGPT.

MeshGPT opens up new possibilities for 3D mesh generation with potential impact across various industries reliant on 3D content creation. The integration of transformer models such as that exemplified by MeshGPT illustrates a promising direction where the power of machine learning can closely align with artisanal creativity, offering automated processes that do not compromise on the detail and quality of manually crafted 3D meshes.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.

Reddit
MeshGPT: Generating Triangle Meshes with Decoder-Only Transformers [R] (36 points, 3 comments) in /r/MachineLearning