Emergent Mind

NPGA: Neural Parametric Gaussian Avatars

(2405.19331)
Published May 29, 2024 in cs.CV , cs.AI , and cs.GR

Abstract

The creation of high-fidelity, digital versions of human heads is an important stepping stone in the process of further integrating virtual components into our everyday lives. Constructing such avatars is a challenging research problem, due to a high demand for photo-realism and real-time rendering performance. In this work, we propose Neural Parametric Gaussian Avatars (NPGA), a data-driven approach to create high-fidelity, controllable avatars from multi-view video recordings. We build our method around 3D Gaussian Splatting for its highly efficient rendering and to inherit the topological flexibility of point clouds. In contrast to previous work, we condition our avatars' dynamics on the rich expression space of neural parametric head models (NPHM), instead of mesh-based 3DMMs. To this end, we distill the backward deformation field of our underlying NPHM into forward deformations which are compatible with rasterization-based rendering. All remaining fine-scale, expression-dependent details are learned from the multi-view videos. To increase the representational capacity of our avatars, we augment the canonical Gaussian point cloud using per-primitive latent features which govern its dynamic behavior. To regularize this increased dynamic expressivity, we propose Laplacian terms on the latent features and predicted dynamics. We evaluate our method on the public NeRSemble dataset, demonstrating that NPGA significantly outperforms the previous state-of-the-art avatars on the self-reenactment task by 2.6 PSNR. Furthermore, we demonstrate accurate animation capabilities from real-world monocular videos.

Avatar optimization using multi-view video, MonoNPHM tracking, forward-deformation prior, and Gaussian point cloud.

Overview

  • The paper 'NPGA: Neural Parametric Gaussian Avatars' introduces a method for creating high-fidelity, controllable digital avatars using neural parametric head models (NPHM) and 3D Gaussian splatting for efficient rendering.

  • NPGA captures avatar dynamics through a canonical Gaussian point cloud enhanced with latent features and employs a cycle-consistency distillation technique to align neural deformations with rasterization-based rendering.

  • The authors demonstrate NPGA's superiority in creating realistic avatars by achieving significant improvements over existing methods in various evaluation metrics and propose future advancements in avatar modeling and data utilization.

NPGA: Neural Parametric Gaussian Avatars

In the paper "NPGA: Neural Parametric Gaussian Avatars," the authors present a method for creating high-fidelity, controllable digital avatars. The approach harnesses multi-view video recordings to enable seamless integration of virtual avatars into various applications, including AR/VR, teleconferencing, and digital media.

This effort is driven by the inherent challenges in creating realistic avatars, such as ensuring photo-realism and achieving real-time rendering. The authors introduce the Neural Parametric Gaussian Avatars (NPGA), which leverage 3D Gaussian splatting for efficient rendering and introduce neural parametric head models (NPHM) to condition avatar dynamics. This method diverges from traditional 3D morphable models (3DMMs) that are mesh-based and limited by their linear nature. Instead, NPGA capitalizes on NPHM to capture a broader expression space with more nuanced dynamic behavior.

Methodology

The proposed method is built around a canonical Gaussian point cloud augmented with per-primitive latent features. These features govern the dynamic behavior of the avatars, providing enriched representation capabilities. The dynamics module, a key component of NPGA, consists of two Multi-Layer Perceptrons (MLPs). The network $F$ is responsible for handling coarse, prior-based deformation, while the network $G$ captures finer details beyond this prior.

A novel strategy called cycle-consistency distillation is employed to convert the backward deformations inherent in NPHM to forward deformations, making them compatible with rasterization-based rendering. This technique optimizes the network $F$ to act as the inverse of the NPHM backward deformation, ensuring that the facial dynamics remain aligned with the neural parametric model.

Implementation and Evaluation

The authors evaluate their approach on the NeRSemble dataset, demonstrating significant enhancements over existing methods. NPGA outperforms traditional GaussianAvatar and GaussianHeadAvatar models on self-reenactment tasks by achieving approximately 2.6 PSNR improvement and notable gains in SSIM and LPIPS metrics. Additionally, NPGA exhibits robust performance in cross-reenactment scenarios and demonstrates the feasibility of avatar animation using monocular RGB tracking in real-world conditions.

Results

The evaluation highlights NPGA’s capacity for creating avatars with higher fidelity and nuanced dynamic expressions, outperforming baselines in both qualitative and quantitative measures. For instance, NPGA achieves an average PSNR of 37.68 compared to 33.92 (GHA) and 33.42 (MVP) in novel view synthesis tasks. These improvements are a testament to the effective integration of per-primitive features and the cycle-consistency approach.

Implications and Future Work

The implications of this research are significant for the future development of digital avatars and related technologies. By leveraging a neural parametric model, NPGA provides a more expressive and controllable framework for avatar animation. This can foster advancements in immersive applications spanning gaming, virtual environments, and telepresence.

Moving forward, the authors suggest extending the underlying 3DMMs to encompass more comprehensive descriptions, including the neck and torso, which are currently inadequately represented. Additionally, there is potential for adopting large-scale multi-view datasets to further enhance the fidelity and generalization of neural models used in avatar creation.

In summary, "NPGA: Neural Parametric Gaussian Avatars" offers a compelling solution to the challenge of creating high-fidelity digital avatars, integrating efficient rendering techniques with advanced neural parametric models to achieve superior dynamic expressivity and visual realism. The approach sets a new benchmark in the quest for responsive and lifelike virtual human representations.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.

YouTube