Mosaic-SDF for 3D Generative Models (2312.09222v2)

Published 14 Dec 2023 in cs.CV and cs.GR

Abstract: Current diffusion or flow-based generative models for 3D shapes divide to two: distilling pre-trained 2D image diffusion models, and training directly on 3D shapes. When training a diffusion or flow models on 3D shapes a crucial design choice is the shape representation. An effective shape representation needs to adhere three design principles: it should allow an efficient conversion of large 3D datasets to the representation form; it should provide a good tradeoff of approximation power versus number of parameters; and it should have a simple tensorial form that is compatible with existing powerful neural architectures. While standard 3D shape representations such as volumetric grids and point clouds do not adhere to all these principles simultaneously, we advocate in this paper a new representation that does. We introduce Mosaic-SDF (M-SDF): a simple 3D shape representation that approximates the Signed Distance Function (SDF) of a given shape by using a set of local grids spread near the shape's boundary. The M-SDF representation is fast to compute for each shape individually making it readily parallelizable; it is parameter efficient as it only covers the space around the shape's boundary; and it has a simple matrix form, compatible with Transformer-based architectures. We demonstrate the efficacy of the M-SDF representation by using it to train a 3D generative flow model including class-conditioned generation with the 3D Warehouse dataset, and text-to-3D generation using a dataset of about 600k caption-shape pairs.

Authors (5)

Lior Yariv (8 papers)
Omri Puny (8 papers)
Natalia Neverova (36 papers)
Oran Gafni (14 papers)
Yaron Lipman (55 papers)

Citations (30)

View on Semantic Scholar

Summary

The paper introduces Mosaic-SDF, a novel representation that efficiently approximates 3D shape SDFs using localized grids and tensor-compatible structures.
The research demonstrates that generative models using Mosaic-SDF produce high-fidelity, diverse 3D shapes with significantly reduced computational costs.
The model excels in class-conditioned and text-conditioned generation tasks, paving the way for future enhancements like texture and lighting integration.

Understanding Mosaic-SDF for 3D Shape Generation

The Challenge in 3D Shape Synthesis

3D shape synthesis is a vital part of various applications, such as virtual reality, gaming, and computer-aided design. Despite recent innovations, generating high-quality 3D shapes remains computationally intensive and complex. Traditional methods split into two groups: optimization-based approaches that are precise but generally slow and require new model training for each sample, and forward-based approaches that lack efficiency in capturing the shape's full space due to suboptimal shape representations.

Introducing Mosaic-SDF

A new representation, Mosaic-SDF (M-SDF), aims to overcome these limitations by providing an efficient, parameter-efficient, and tensor-compatible approximation of the 3D shape's Signed Distance Function (SDF). M-SDF leverages small local grids positioned near the shape's boundary, and has the form of a matrix where each row corresponds to an individual grid. Such an organization allows M-SDF to be computed swiftly and independently for each shape, is easily parallelizable, and is suitable for use with Transformer-based neural architectures.

M-SDF in Practice

To prove the effectiveness of M-SDF, researchers trained a forward-based flow generative model using this new representation. The training involved a large dataset of 3D shapes, and M-SDF was shown to provide high-quality and diverse 3D shape generation. It drastically reduced the time and computational resources needed to approximate SDFs compared to volumetric grids, Triplanes, and Implicit Neural Representations, while maintaining high-resolution details.

Evaluations and Results

The performance of generative models using M-SDF was evaluated through several metrics, including geometric distance-based metrics like Coverage, Minimum Matching Distance, and 1-Nearest Neighbor Accuracy, further supported by perceptual distance metrics like Frechet PointNet++ Distance and Kernel PointNet++ Distance. M-SDF-based models performed favorably against existing methods, demonstrating an ability to generate higher fidelity shapes with enhanced details.

Moreover, the flexibility of the M-SDF representation was showcased through class-conditioned and text-conditioned 3D shape generation tasks. The model produced a broad spectrum of shapes from various classes with intricate structures and responded to textual prompts with relevant 3D shapes.

Future Directions

Although M-SDF has proven to be a robust representation for 3D shapes, there's room for expansion. Future efforts could include integrating texture, color, and lighting data, enhancing the representation structure through convolution layers or autoencoders, and developing orientation equivariance to bolster the model’s generalization capabilities.

Mosaic-SDF presents a significant step forward in the efficient and high-quality generation of 3D forms, promising to accelerate advances in fields that rely on synthetic three-dimensional data.

PDF Markdown

Related Papers

Tweets

https://twitter.com/AIatMeta/status/1748415282790728160

https://twitter.com/HannesStaerk/status/1779581590307483832

https://twitter.com/yusukeshimotojp/status/1748439353762910480

YouTube

Show All Videos