Papers

Topics

Authors

Recent

View all

Gemini 2.5 Flash

9 tokens/sec

GPT-4o

12 tokens/sec

Gemini 2.5 Pro Pro

40 tokens/sec

o3 Pro

5 tokens/sec

GPT-4.1 Pro

38 tokens/sec

DeepSeek R1 via Azure Pro

28 tokens/sec

2000 character limit reached

Make-A-Shape: a Ten-Million-scale 3D Shape Model (2401.11067v2)

Published 20 Jan 2024 in cs.CV and cs.GR

Abstract: Significant progress has been made in training large generative models for natural language and images. Yet, the advancement of 3D generative models is hindered by their substantial resource demands for training, along with inefficient, non-compact, and less expressive representations. This paper introduces Make-A-Shape, a new 3D generative model designed for efficient training on a vast scale, capable of utilizing 10 millions publicly-available shapes. Technical-wise, we first innovate a wavelet-tree representation to compactly encode shapes by formulating the subband coefficient filtering scheme to efficiently exploit coefficient relations. We then make the representation generatable by a diffusion model by devising the subband coefficients packing scheme to layout the representation in a low-resolution grid. Further, we derive the subband adaptive training strategy to train our model to effectively learn to generate coarse and detail wavelet coefficients. Last, we extend our framework to be controlled by additional input conditions to enable it to generate shapes from assorted modalities, e.g., single/multi-view images, point clouds, and low-resolution voxels. In our extensive set of experiments, we demonstrate various applications, such as unconditional generation, shape completion, and conditional generation on a wide range of modalities. Our approach not only surpasses the state of the art in delivering high-quality results but also efficiently generates shapes within a few seconds, often achieving this in just 2 seconds for most conditions. Our source code is available at https://github.com/AutodeskAILab/Make-a-Shape.

References (110)

Citations (8)

View on Semantic Scholar

Summary

The paper introduces a wavelet-tree representation that nearly losslessly encodes high-resolution 3D shapes at a ten-million scale.
It employs a diffusion model with subband adaptive training to capture both coarse structures and fine textures efficiently.
The framework supports conditional generation from diverse inputs such as images, point clouds, and voxels, enabling tasks like zero-shot shape completion.

Introduction

In the pursuit of more advanced 3D generative models, there remains a gap in representation efficacy and training efficiency on large datasets. To bridge this, the introduced Make-A-Shape framework is pivotal. Offering a comprehensive approach for efficient large-scale 3D model training, this framework proficiently handles over ten million shapes, showcasing a leap forward in addressing the prevalent issues in 3D generative modeling.

The Wavelet-Tree Representation

Make-A-Shape innovates with the wavelet-tree representation, adopting a wavelet decomposition on a high-resolution SDF grid. This yields a representation that retains coarse and detail subband coefficients, marrying expressiveness with compactness—a vital advantage in streaming and training on extensive 3D shape datasets. By harnessing these coefficients rather than discarding high-frequency details for learning efficiency, the representation nearly losslessly encodes 3D shapes. This stands in contrast to prior models that tend to lose detail for efficiency.

Efficient Training with the Diffusion Model

The model transcends the limitations of inefficient learning by packing wavelet-tree coefficients into a diffusible grid layout, amenable for a diffusion-based generative model. A subband adaptive training strategy ensures the model captures the full spectrum of shape details, from coarse structure to fine textures, sans the collapse or ineffectual learning that could arise from a naive Mean Squared Error application.

Conditional Generation Capability

Make-A-Shape also extends its utility to conditional generation, handling a variety of inputs. Different modalities, including single/multi-view images, point clouds, and low-resolution voxels, are accommodated by converting conditions into latent vectors, followed by employing these vectors in the generative network. This modular approach enables the framework to adapt to diverse inputs effortlessly, a characteristic that positions it for practical applications where conditions might differ significantly.

Experiments and Results

The model's proficiency is evidenced by extensive experimental validation. It generates conditions cognizant 3D shapes, outperforming the state-of-the-art, particularly with image inputs, where it demonstrates superior capability in rendering the visible parts of objects while presenting credible variations for the unseen segments. The framework also shows adaptability, swiftly adjusting to point cloud density variations and voxel resolutions without sacrificing quality.

Importantly, the framework paves the way for tasks beyond generation, such as zero-shot shape completion, where it can inventively fill gaps in partial inputs. This versatility extends the utility of Make-A-Shape into domains where object restoration or extrapolation is essential.

Conclusions and Future Directions

Make-A-Shape heralds a new era in large-scale 3D shape modeling, providing a route to train generative models that can synthesize superior quality outputs rapidly. One limitation, however, is the model's inclination towards certain object categories due to training data imbalance. Additionally, the current focus is solely on geometry without considerations for texture. Future works could aim to mitigate these limitations by exploring category annotations and introducing texture to the generative process. The promise that Make-A-Shape holds for the advancement of 3D content creation, simulation, and possibly even virtual reality and gaming, is substantial and exciting.

PDF Markdown

Tweets

https://twitter.com/_akhaliq/status/1749667916759384335

https://twitter.com/BrianRoemmele/status/1749818955555045760

https://twitter.com/fly51fly/status/1749922388559683855

https://twitter.com/Floreum/status/1788363734060331263

https://twitter.com/Floreum/status/1788371422081085877