MeshGPT: Generating Triangle Meshes with Decoder-Only Transformers (2311.15475v1)

Published 27 Nov 2023 in cs.CV and cs.LG

Abstract: We introduce MeshGPT, a new approach for generating triangle meshes that reflects the compactness typical of artist-created meshes, in contrast to dense triangle meshes extracted by iso-surfacing methods from neural fields. Inspired by recent advances in powerful LLMs, we adopt a sequence-based approach to autoregressively generate triangle meshes as sequences of triangles. We first learn a vocabulary of latent quantized embeddings, using graph convolutions, which inform these embeddings of the local mesh geometry and topology. These embeddings are sequenced and decoded into triangles by a decoder, ensuring that they can effectively reconstruct the mesh. A transformer is then trained on this learned vocabulary to predict the index of the next embedding given previous embeddings. Once trained, our model can be autoregressively sampled to generate new triangle meshes, directly generating compact meshes with sharp edges, more closely imitating the efficient triangulation patterns of human-crafted meshes. MeshGPT demonstrates a notable improvement over state of the art mesh generation methods, with a 9% increase in shape coverage and a 30-point enhancement in FID scores across various categories.

Citations (57)

View on Semantic Scholar

Summary

The paper introduces a decoder-only transformer that autonomously generates triangle meshes with high detail by learning geometric embeddings.
It leverages graph convolutional encoding with residual quantization to tokenize local geometric features from existing mesh structures.
Experiments on ShapeNet show a 9% boost in shape coverage and a 30-point increase in FID scores compared to previous methods.

Generating 3D Triangle Meshes with MeshGPT: A Transformer-Based Approach

In the field of computer graphics and 3D modeling, triangle meshes are foundational elements used to create detailed and intricate virtual shapes. These meshes are essential in applications ranging from video games and movies to scientific visualization and virtual reality. Traditional methods for creating these meshes often involve either artist-driven manual modeling or algorithmic generation, which can result in either highly detailed yet resource-intensive creations or efficient but less detail-oriented models.

The novel method introduced by MeshGPT seeks to harmonize these two approaches by using a transformer-based architecture to generate triangle meshes that are both compact and rich in geometric detail. Inspired by breakthroughs in LLMs, MeshGPT is a decoder-only transformer model trained to autoregressively produce sequences of triangular faces that constitute a 3D mesh.

To begin, MeshGPT learns a vocabulary of latent geometric embeddings using graph convolutions, these embeddings are informed by the local geometry and topology of existing mesh structures. By tokenizing these embeddings, it becomes possible to reconstruct intricate mesh details through a decoding process.

Once equipped with its geometric vocabulary, MeshGPT's transformer model can predict the next token (embedding index) in a sequence, building upon previously generated tokens. This autoregressive sampling effectively generates new meshes that are not only detailed but also possess definition and sharpness akin to manual artistic modeling.

In comparison to alternative methods such as point clouds or neural field-based techniques that require post-processing for utilization, MeshGPT intertwines the generation process more directly with the type of triangle mesh outputs desired in the industry. The model outperforms existing methods in terms of shape coverage and fidelity—critical aspects of any 3D shape generation technique.

MeshGPT's method demonstrates a significant improvement over state-of-the-art mesh generation techniques, showing a boost in shape coverage by an average of 9% and enhanced Fréchet Inception Distance (FID) scores by 30 points across various categories.

To create the variety of embeddings used for mesh generation, the researchers employ a graph convolutional encoder to extract geometric features that are subsequently quantized using what is called residual quantization. The quantized features inform the decoder to produce vertex coordinates for the mesh triangles. The GPT-style transformer, trained with these embeddings, then enables the decoded geometry to represent a mesh.

The researchers conducted extensive experiments with the ShapeNet dataset, utilizing known metrics such as Minimum Matching Distance (MMD), Coverage (COV), and 1-Nearest-Neighbor Accuracy (1-NNA) to demonstrate the effectiveness of MeshGPT.

MeshGPT opens up new possibilities for 3D mesh generation with potential impact across various industries reliant on 3D content creation. The integration of transformer models such as that exemplified by MeshGPT illustrates a promising direction where the power of machine learning can closely align with artisanal creativity, offering automated processes that do not compromise on the detail and quality of manually crafted 3D meshes.

Related Papers

GitHub

MeshGPT: Generating Triangle Meshes with Decoder-Only Transformers

Tweets

https://twitter.com/sameQCU/status/1911975822724043153

HackerNews

MeshGPT: Generating triangle meshes with decoder-only transformers (738 points, 157 comments)

Reddit

MeshGPT: Generating Triangle Meshes with Decoder-Only Transformers [R] (36 points, 3 comments)