Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 177 tok/s
Gemini 2.5 Pro 50 tok/s Pro
GPT-5 Medium 31 tok/s Pro
GPT-5 High 32 tok/s Pro
GPT-4o 93 tok/s Pro
Kimi K2 183 tok/s Pro
GPT OSS 120B 447 tok/s Pro
Claude Sonnet 4.5 36 tok/s Pro
2000 character limit reached

On Learning 3D Face Morphable Model from In-the-wild Images (1808.09560v2)

Published 28 Aug 2018 in cs.CV

Abstract: As a classic statistical model of 3D facial shape and albedo, 3D Morphable Model (3DMM) is widely used in facial analysis, e.g., model fitting, image synthesis. Conventional 3DMM is learned from a set of 3D face scans with associated well-controlled 2D face images, and represented by two sets of PCA basis functions. Due to the type and amount of training data, as well as, the linear bases, the representation power of 3DMM can be limited. To address these problems, this paper proposes an innovative framework to learn a nonlinear 3DMM model from a large set of in-the-wild face images, without collecting 3D face scans. Specifically, given a face image as input, a network encoder estimates the projection, lighting, shape and albedo parameters. Two decoders serve as the nonlinear 3DMM to map from the shape and albedo parameters to the 3D shape and albedo, respectively. With the projection parameter, lighting, 3D shape, and albedo, a novel analytically-differentiable rendering layer is designed to reconstruct the original input face. The entire network is end-to-end trainable with only weak supervision. We demonstrate the superior representation power of our nonlinear 3DMM over its linear counterpart, and its contribution to face alignment, 3D reconstruction, and face editing.

Citations (151)

Summary

  • The paper introduces a nonlinear 3DMM that learns from in-the-wild images, eliminating the need for expensive 3D face scans.
  • The proposed method utilizes an encoder-decoder architecture with a differentiable rendering layer to accurately convert 2D images into detailed 3D faces.
  • Quantitative and qualitative evaluations highlight superior performance in face alignment and reconstruction compared to traditional linear models.

Learning a Nonlinear 3D Morphable Model from In-the-wild Images

The paper "On Learning 3D Face Morphable Model from In-the-wild Images" presents an innovative approach to develop a nonlinear 3D Morphable Model (3DMM) using only in-the-wild images. This circumvents the traditional requirement for 3D face scans, which are expensive and laborious to collect. The framework leverages the power of Deep Neural Networks (DNNs) to achieve end-to-end trainability in a weakly supervised manner, thereby addressing the limitations of previous linear models.

Proposed Nonlinear 3D Morphable Model

Framework Overview

The proposed framework introduces a system that comprises an encoder and two decoders, representing the nonlinear 3DMM. The encoder estimates parameters for projection, lighting, shape, and albedo from a 2D face image. Two decoders, acting as nonlinear mappings, convert these shape and albedo parameters into 3D shapes and albedo maps. Crucially, a differentiable rendering layer allows reconstruction of the input face image by combining the estimated 3D shape and albedo with lighting and projection parameters. This rendering layer is pivotal for the end-to-end training of the model in a weakly supervised setting. Figure 1

Figure 1: Conventional 3DMM employs linear bases models for shape/albedo, which are trained with 3D face scans and associated controlled 2D images. We propose a nonlinear 3DMM to model shape/albedo via deep neural networks~(DNNs). It can be trained from in-the-wild face images without 3D scans, and also better reconstruct the original images due to the inherent nonlinearity.

Shape and Albedo Representation

The nonlinear 3DMM learns representations directly from large collections of in-the-wild images, circumventing the traditional need for 3D face scans. The proposed model enhances the representation power by replacing the PCA-based linear bases with deep convolutional networks. The shape and albedo are represented as 2D images, maintaining spatial relationships and leveraging CNNs' capability in image synthesis. Figure 2

Figure 2: Jointly learning a nonlinear 3DMM and its fitting algorithm from unconstrained 2D in-the-wild face image collection, in a weakly supervised fashion.

Differentiable Rendering Layer

A novel differentiable rendering layer is introduced, which facilitates the accurate reconstruction of the face images. This layer integrates shading and albedo information using spherical harmonics to approximate lighting effects. By doing so, the layer ensures that the networks can be trained using 2D image supervision, allowing for realistic texture generation and reconstruction.

Model Learning and Regularization

The network trains end-to-end by minimizing the combination of loss functions, including reconstruction, landmark, and regularization losses. Regularizations ensure plausible reconstructions by incorporating albedo symmetry, albedo constancy, and shape smoothness constraints. The training process employs intermediate supervision using pseudo-groundtruth from the 300W dataset, eventually switching to full model optimization for improved performance. Figure 3

Figure 3

Figure 3

Figure 3

Figure 3

Figure 3

Figure 3

Figure 3

Figure 3

Figure 3

Figure 3

Figure 3

Figure 3

Figure 3: Effect of albedo regularizations: albedo symmetry (sym) and albedo constancy (const). When there is no regularization being used, shading is mostly baked into the albedo. Using the symmetry property helps to resolve the global lighting. Using constancy constraint further removes shading from the albedo, which results in a better 3D shape.

Applications and Comparisons

Applications

The nonlinear 3DMM framework supports various applications such as 2D face alignment, 3D reconstruction and face editing. For instance, the model can generate realistic face reconstructions even under extreme poses and lighting conditions, demonstrating its robustness.

Qualitative and Quantitative Comparisons

The paper performs extensive evaluations, showcasing the superiority of the nonlinear 3DMM over traditional linear models in terms of expressiveness and representation power. Quantitative analyses, such as NME comparisons on AFLW2000 and Florence datasets, highlight significant improvements in face alignment and reconstruction tasks over existing methods. Figure 4

Figure 4

Figure 4

Figure 4

Figure 4

Figure 4

Figure 4: Shape representation power comparison on Basel scans. Our nonlinear model is able to reconstruct input 3D scans with smaller errors than the linear model (l_S = 160 for both models). The error map shows the normalized per-vertex errors.

Conclusion

The paper establishes a new paradigm for learning 3DMMs, efficiently using in-the-wild face images and deep neural networks to achieve impressive gains in representation power and model fitting. It indicates a promising direction for future research in unsupervised or weakly supervised learning of 3D models from large-scale 2D datasets, potentially expanding applications to further domains outside facial analysis. The results demonstrate the potential of nonlinear models to overcome the limitations of linear methods, especially for tasks involving complex real-world data.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Authors (2)

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.