GaussianAvatars: Photorealistic Head Avatars with Rigged 3D Gaussians (2312.02069v2)

Published 4 Dec 2023 in cs.CV

Abstract: We introduce GaussianAvatars, a new method to create photorealistic head avatars that are fully controllable in terms of expression, pose, and viewpoint. The core idea is a dynamic 3D representation based on 3D Gaussian splats that are rigged to a parametric morphable face model. This combination facilitates photorealistic rendering while allowing for precise animation control via the underlying parametric model, e.g., through expression transfer from a driving sequence or by manually changing the morphable model parameters. We parameterize each splat by a local coordinate frame of a triangle and optimize for explicit displacement offset to obtain a more accurate geometric representation. During avatar reconstruction, we jointly optimize for the morphable model parameters and Gaussian splat parameters in an end-to-end fashion. We demonstrate the animation capabilities of our photorealistic avatar in several challenging scenarios. For instance, we show reenactments from a driving video, where our method outperforms existing works by a significant margin.

Authors (6)

Shenhan Qian (9 papers)
Tobias Kirschstein (13 papers)
Liam Schoneveld (5 papers)
Davide Davoli (10 papers)
Simon Giebenhain (11 papers)
Matthias Nießner (177 papers)

Citations (79)

View on Semantic Scholar

Summary

The paper introduces a novel approach using rigged 3D Gaussian splats attached to FLAME meshes to generate photorealistic, animatable head avatars.
It outperforms baseline models in novel-view synthesis and reenactment with significant improvements in PSNR, SSIM, and LPIPS metrics.
It captures intricate facial details for high-fidelity animation, paving the way for advances in immersive media and virtual reality applications.

Summary of "GaussianAvatars: Photorealistic Head Avatars with Rigged 3D Gaussians"

The paper "GaussianAvatars: Photorealistic Head Avatars with Rigged 3D Gaussians" presents a novel approach to generating photorealistic and animatable head avatars from multi-view video data. The method leverages 3D Gaussian splats, which are attached to a parametric morphable face model to enable precise control over the avatar's pose, expression, and viewpoint. This method aims to advance current 3D model representations by overcoming limitations in animation controllability often observed in existing approaches, such as Neural Radiance Fields (NeRF) and its derivatives.

Methodology

The cornerstone of the proposed technique is the dynamic 3D representation using Gaussian splats, which are geometric primitives that facilitate real-time rendering. These splats are dynamically manipulated in response to underlying parametric face models, specifically FLAME meshes, leading to a sophisticated combination of photorealism and controllability. The Gaussian splats are optimized in an end-to-end manner to yield a more accurate geometric representation, allowing for high fidelity in capturing subtle facial details, such as wrinkles or mouth interiors, which are notoriously difficult to model.

A key innovation lies in the way Gaussian splats are rigged to the mesh triangles. Each triangle in the FLAME mesh is initialized with a corresponding Gaussian splat at its center, and as animations occur, these splats follow the transformations of their parent triangles. This rigging facilitates consistent correspondence and alignment during dynamic animations. Furthermore, the paper introduces a "binding inheritance strategy" to maintain controllability when adaptive density techniques add or remove splats during optimization.

Numerical Results and Claims

Evaluations conducted on video recordings of nine subjects demonstrate the method's capability to produce superior photorealistic results compared to baseline methods in novel-view synthesis and self-reenactment scenarios. The use of quantitative metrics such as PSNR, SSIM, and LPIPS reveals significant improvements in rendering quality and geometric accuracy. The results indicate that GaussianAvatars not only offers enhanced quality for novel views and expressions but also maintains performance in complex reenactments with high temporal consistency.

Theoretical and Practical Implications

GaussianAvatars contributes to both theoretical and practical domains of computer vision and graphics. Theoretically, it expands the understanding of dynamic radiance field modeling by introducing the concept of 3D Gaussian splats, offering insights into how complex models can balance geometric fidelity with rendering simplicity. Practically, the ability to construct and control realistic human head avatars has direct implications for media and entertainment industries, such as immersive virtual reality, gaming, telepresence, and cinematic productions.

The model's limitations, such as the challenge of relighting due to its radiance field-based approach, suggest areas for future exploration. Another limitation pertains to the control over components not directly modeled by the FLAME mesh, such as hair and dynamic environmental interactions. Future research could investigate these challenges and further integrate other body parts or environmental factors into this framework.

Future Outlook

The advancement of techniques like GaussianAvatars paves the way for a new generation of AI-driven graphics where personalized and dynamic avatar creation becomes more accessible. As computational resources and techniques improve, the insights gleaned from GaussianAvatars could inspire more robust, scalable, and interactive model designs. Researchers and developers may explore expanding this methodology to full-body animations or integrating advanced texture and lighting models for even more realistic rendering.

Overall, "GaussianAvatars" represents a significant step forward in fine-grained, photorealistic avatar creation, offering actionable methodologies and promising avenues for extending the capabilities of synthesized animations in virtual environments. The research lays foundational work that could stimulate comprehensive exploration across the fields of AI and computer graphics, setting the stage for continued innovation and application in diverse areas of digital interaction.

PDF Markdown

Related Papers

Tweets

https://twitter.com/orgicus/status/1924860863904665636