DiFaReli: Diffusion Face Relighting (2304.09479v3)

Published 19 Apr 2023 in cs.CV, cs.GR, and cs.LG

Abstract: We present a novel approach to single-view face relighting in the wild. Handling non-diffuse effects, such as global illumination or cast shadows, has long been a challenge in face relighting. Prior work often assumes Lambertian surfaces, simplified lighting models or involves estimating 3D shape, albedo, or a shadow map. This estimation, however, is error-prone and requires many training examples with lighting ground truth to generalize well. Our work bypasses the need for accurate estimation of intrinsic components and can be trained solely on 2D images without any light stage data, multi-view images, or lighting ground truth. Our key idea is to leverage a conditional diffusion implicit model (DDIM) for decoding a disentangled light encoding along with other encodings related to 3D shape and facial identity inferred from off-the-shelf estimators. We also propose a novel conditioning technique that eases the modeling of the complex interaction between light and geometry by using a rendered shading reference to spatially modulate the DDIM. We achieve state-of-the-art performance on standard benchmark Multi-PIE and can photorealistically relight in-the-wild images. Please visit our page: https://diffusion-face-relighting.github.io

Citations (15)

View on Semantic Scholar

Summary

The paper presents a diffusion-based framework that bypasses precise intrinsic decomposition, achieving high-fidelity face relighting.
The method encodes 2D facial images into feature vectors and modifies lighting information using both spatial and non-spatial conditioning.
Experimental results on Multi-PIE benchmarks show that DiFaReli reliably outperforms state-of-the-art techniques in preserving facial details and relighting quality.

Background

Conventional face relighting methods often require complex estimations of facial geometry, albedo, and lighting parameters, as well as an understanding of the interaction between these components, such as cast shadows and global illumination. Prior approaches have faced challenges in handling non-diffuse effects and are typically dependent on the accuracy of estimated intrinsic components, which can be error-prone, particularly in real-world scenarios.

Diffusion-Based Approach

The paper "DiFaReli: Diffusion Face Relighting" introduces a novel framework that bypasses the need for precise intrinsic decomposition by leveraging diffusion models. The authors propose a conditional diffusion implicit model (DDIM) that works with a spatial and non-spatial conditioning technique to effectively relight faces without the requirement of accurately estimated intrinsic components or 3D and lighting ground truth.

The primary innovation of the paper lies in using a modified DDIM, trained solely on 2D images, to both decode and implicitly learn the complex interactions between light and facial geometry. The approach utilizes off-the-shelf estimators for input encoding, avoiding the need for multi-view or light stage data typically required by traditional methods.

Methodology

DiFaReli's approach relies on encoding the input image into a feature vector that disentangles the light information from other facial attributes. During relighting, this vector's light encoding is modified, and the augmented vector is then decoded to obtain the relit image, preserving the subject's identity and details. The model processes spatial encoding by using a shading reference image, spatially aligned with the input's geometry and lighting, while non-spatial encoding incorporates facial identity and cast shadow intensity.

Key to this method is the use of spherical harmonic lighting to condition the generative process, alongside shape and camera parameters inferred from 3D estimators. This conditioning is different from direct rendering, as it relies on synthetic examples to implicitly model complex illumination effects. The authors also introduce novel spatial modulation weights for pixel intensity correlation in the generation process, giving the diffusion model an easier conditioning signal to learn from.

Results

Experimental evaluations on standard benchmarks like Multi-PIE demonstrate that DiFaReli can photorealistically relight images, significantly outperforming state-of-the-art models on both qualitative and quantitative grounds. The approach provides high fidelity in relighting and shadow manipulation while maintaining the subject's original facial details, which are often compromised by alternative methods.

Conclusion

The "DiFaReli: Diffusion Face Relighting" paper presents a groundbreaking diffusion-based framework that tackles the longstanding challenges in face relighting with state-of-the-art performance. By leveraging the power of diffusion models calibrated by light and shadow encodings, this method promises significant advancements in applications requiring photorealistic illumination conditions on faces, such as augmented reality and portrait photography.

PDF Markdown