Emergent Mind

Mosaic-SDF for 3D Generative Models

(2312.09222)
Published Dec 14, 2023 in cs.CV and cs.GR

Abstract

Current diffusion or flow-based generative models for 3D shapes divide to two: distilling pre-trained 2D image diffusion models, and training directly on 3D shapes. When training a diffusion or flow models on 3D shapes a crucial design choice is the shape representation. An effective shape representation needs to adhere three design principles: it should allow an efficient conversion of large 3D datasets to the representation form; it should provide a good tradeoff of approximation power versus number of parameters; and it should have a simple tensorial form that is compatible with existing powerful neural architectures. While standard 3D shape representations such as volumetric grids and point clouds do not adhere to all these principles simultaneously, we advocate in this paper a new representation that does. We introduce Mosaic-SDF (M-SDF): a simple 3D shape representation that approximates the Signed Distance Function (SDF) of a given shape by using a set of local grids spread near the shape's boundary. The M-SDF representation is fast to compute for each shape individually making it readily parallelizable; it is parameter efficient as it only covers the space around the shape's boundary; and it has a simple matrix form, compatible with Transformer-based architectures. We demonstrate the efficacy of the M-SDF representation by using it to train a 3D generative flow model including class-conditioned generation with the 3D Warehouse dataset, and text-to-3D generation using a dataset of about 600k caption-shape pairs.

Overview

  • Mosaic-SDF provides an efficient parameter-efficient representation for 3D shapes, making high-quality shape synthesis more computationally accessible.

  • The representation utilizes local grids for the shape's boundary, allowing for rapid, independent computation and suitability for Transformer-based neural networks.

  • Generative models using Mosaic-SDF demonstrated improved performance in producing high-fidelity shapes with fine details across various evaluation metrics.

  • M-SDF-based models can perform class-conditioned and text-conditioned 3D shape generation, creating detailed and relevant shapes from textual prompts.

  • The paper identifies potential improvements for Mosaic-SDF, such as integrating texture and color data, and encouraging orientation equivariance for better generalization.

Understanding Mosaic-SDF for 3D Shape Generation

The Challenge in 3D Shape Synthesis

3D shape synthesis is a vital part of various applications, such as virtual reality, gaming, and computer-aided design. Despite recent innovations, generating high-quality 3D shapes remains computationally intensive and complex. Traditional methods split into two groups: optimization-based approaches that are precise but generally slow and require new model training for each sample, and forward-based approaches that lack efficiency in capturing the shape's full space due to suboptimal shape representations.

Introducing Mosaic-SDF

A new representation, Mosaic-SDF (M-SDF), aims to overcome these limitations by providing an efficient, parameter-efficient, and tensor-compatible approximation of the 3D shape's Signed Distance Function (SDF). M-SDF leverages small local grids positioned near the shape's boundary, and has the form of a matrix where each row corresponds to an individual grid. Such an organization allows M-SDF to be computed swiftly and independently for each shape, is easily parallelizable, and is suitable for use with Transformer-based neural architectures.

M-SDF in Practice

To prove the effectiveness of M-SDF, researchers trained a forward-based flow generative model using this new representation. The training involved a large dataset of 3D shapes, and M-SDF was shown to provide high-quality and diverse 3D shape generation. It drastically reduced the time and computational resources needed to approximate SDFs compared to volumetric grids, Triplanes, and Implicit Neural Representations, while maintaining high-resolution details.

Evaluations and Results

The performance of generative models using M-SDF was evaluated through several metrics, including geometric distance-based metrics like Coverage, Minimum Matching Distance, and 1-Nearest Neighbor Accuracy, further supported by perceptual distance metrics like Frechet PointNet++ Distance and Kernel PointNet++ Distance. M-SDF-based models performed favorably against existing methods, demonstrating an ability to generate higher fidelity shapes with enhanced details.

Moreover, the flexibility of the M-SDF representation was showcased through class-conditioned and text-conditioned 3D shape generation tasks. The model produced a broad spectrum of shapes from various classes with intricate structures and responded to textual prompts with relevant 3D shapes.

Future Directions

Although M-SDF has proven to be a robust representation for 3D shapes, there's room for expansion. Future efforts could include integrating texture, color, and lighting data, enhancing the representation structure through convolution layers or autoencoders, and developing orientation equivariance to bolster the model’s generalization capabilities.

Mosaic-SDF presents a significant step forward in the efficient and high-quality generation of 3D forms, promising to accelerate advances in fields that rely on synthetic three-dimensional data.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.

YouTube