Emergent Mind

AG3D: Learning to Generate 3D Avatars from 2D Image Collections

(2305.02312)
Published May 3, 2023 in cs.CV

Abstract

While progress in 2D generative models of human appearance has been rapid, many applications require 3D avatars that can be animated and rendered. Unfortunately, most existing methods for learning generative models of 3D humans with diverse shape and appearance require 3D training data, which is limited and expensive to acquire. The key to progress is hence to learn generative models of 3D avatars from abundant unstructured 2D image collections. However, learning realistic and complete 3D appearance and geometry in this under-constrained setting remains challenging, especially in the presence of loose clothing such as dresses. In this paper, we propose a new adversarial generative model of realistic 3D people from 2D images. Our method captures shape and deformation of the body and loose clothing by adopting a holistic 3D generator and integrating an efficient and flexible articulation module. To improve realism, we train our model using multiple discriminators while also integrating geometric cues in the form of predicted 2D normal maps. We experimentally find that our method outperforms previous 3D- and articulation-aware methods in terms of geometry and appearance. We validate the effectiveness of our model and the importance of each component via systematic ablation studies.

Overview

  • AG3D is a new generative model that synthesizes 3D human avatars using unstructured 2D image collections.

  • The model combines a 3D generator with multiple discriminators and an articulation module for realistic shape and clothing deformation.

  • Empirical results show AG3D outperforms other methods, offering better shape and image quality as evidenced by numerical advancements and user preference scores.

  • This technology has potential implications in virtual environments, gaming, VR experiences, and more, by generating diverse and realistic 3D avatars from 2D imagery without direct 3D supervision.

Introduction to AG3D

Advancements in the realm of generative models, particularly Generative Adversarial Networks (GANs), have produced photorealistic 2D images of various objects, including clothed humans. However, for applications that demand 3D avatars capable of animation and rendering, this 2D output is insufficient. Learning to generate 3D models of humans with diverse appearances presents immense challenges, notably when only 2D training data is available. To address this, researchers have proposed a generative model called AG3D, which leverages unstructured 2D image collections for synthesizing novel 3D humans.

Model Architecture

The AG3D method introduces a generative model that adeptly captures the shape and deformation of both the body and loose clothing through a holistic 3D generator. It coordinates with a highly efficient and flexible articulation module to produce 3D avatars. To improve the realism in the generated models, AG3D utilizes multiple discriminators, while geometric cues are integrated in the form of predicted 2D normal maps to offer additional guidance for crafting more precise shapes.

Empirical Results and Findings

Extensive experimentation verifies that AG3D delivers superior results in comparison to previous approaches that were aware of 3D structure and articulation, both quantitatively and qualitatively. The researchers present evident numerical advances: in a user study depicted in Figure 4, participants preferred the shapes and images created by AG3D to those produced by EVA3D with a preference score of 71.7% for shape and 81.4% for image quality. In summary, AG3D is credited with (i) creating a generative model of articulated 3D humans with a cutting-edge appearance and geometry; (ii) pioneering a new generator capable of shaping and transforming loose clothing; and (iii) presenting specialized discriminators that markedly elevate visual and geometric fidelity.

Future Implications

By introducing AG3D, this research lays the foundation for a future where 3D human avatars can be generated directly from widespread 2D internet imagery. Such advancements could have substantial implications for the creation of virtual environments, gaming, VR experiences, and the broader field of entertainment. The ability to concisely model deformations for points distant from the body, such as loose clothing, signifies a notable progression in the modeling of complex wearables and styles. This capability expands the potential for creating wide-ranging avatars that reflect diversity in apparel and presentation.

In conclusion, AG3D marks a significant stride in generative AI, demonstrating how 2D data can give rise to high-quality 3D representations without direct 3D supervision. This innovative approach shows promise in bridging the gap between abundant 2D image data and the escalating need for diverse and realistic 3D avatars in various applications.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.