PointAvatar: Deformable Point-based Head Avatars from Videos (2212.08377v2)

Published 16 Dec 2022 in cs.CV and cs.GR

Abstract: The ability to create realistic, animatable and relightable head avatars from casual video sequences would open up wide ranging applications in communication and entertainment. Current methods either build on explicit 3D morphable meshes (3DMM) or exploit neural implicit representations. The former are limited by fixed topology, while the latter are non-trivial to deform and inefficient to render. Furthermore, existing approaches entangle lighting in the color estimation, thus they are limited in re-rendering the avatar in new environments. In contrast, we propose PointAvatar, a deformable point-based representation that disentangles the source color into intrinsic albedo and normal-dependent shading. We demonstrate that PointAvatar bridges the gap between existing mesh- and implicit representations, combining high-quality geometry and appearance with topological flexibility, ease of deformation and rendering efficiency. We show that our method is able to generate animatable 3D avatars using monocular videos from multiple sources including hand-held smartphones, laptop webcams and internet videos, achieving state-of-the-art quality in challenging cases where previous methods fail, e.g., thin hair strands, while being significantly more efficient in training than competing methods.

Citations (113)

View on Semantic Scholar

Summary

The paper introduces a deformable point-based representation that leverages learned blendshapes and skinning weights to map canonical points to a deformed space efficiently.
It disentangles intrinsic albedo and shading, enabling effective relighting and more realistic rendering of 3D head avatars.
The approach achieves a six-fold reduction in training time while accurately capturing complex details like hair and eyeglasses from monocular videos.

Analyzing "PointAvatar: Deformable Point-based Head Avatars from Videos"

This paper presents a novel approach to generating animatable and relightable 3D head avatars using a technique called PointAvatar. The method leverages deformable point-based representations derived from monocular RGB videos captured through various devices like smartphones, webcams, and internet sources. The proposed system addresses several limitations found in existing methods that utilize either explicit 3D morphable meshes (3DMM) or neural implicit representations.

Methodology

The central innovation of PointAvatar is its point-based representation, which effectively bridges the gap between existing mesh- and implicit-based methods. Mesh-based methods tend to struggle with fixed topology, preventing easy adaptation to complex or evolving geometrical features like eyeglasses or detailed hairstyles. In contrast, neural implicit representations, while better at capturing geometric richness, are inefficient in terms of rendering and deformation.

PointAvatar introduces a deformable point cloud system, where each point represents a component of the head avatar in a canonical space. A continuous deformation field maps these canonical points to a deformed space using learned blendshapes and skinning weights. This allows for efficient rendering and straightforward deformation.

Key Contributions

Efficient Representation: The point-based approach results in rendering efficiency, allowing full-image rendering during training, which is starkly more efficient than implicit methods. This efficiency facilitates the use of powerful image-based losses, markedly enhancing photo-realism.
Lighting Disentanglement: PointAvatar disentangles the source color into intrinsic albedo and shading components dependent on normal directions. This distinction enables the system to re-render avatars in new environments, which is particularly challenging for previous systems that entangle lighting and color estimation.
Versatile Geometry Handling: The representation allows flexibility in topology and can accurately depict both volume-like structures (e.g., hair) and surface-like geometries (e.g., skin).

Results

The system demonstrates superiority in generating high-quality 3D avatars from monocular inputs, particularly in scenarios where existing methods falter. It handles challenging features like thin hair strands and eyeglasses with precision and requires significantly less training time—highlighted by a six-fold reduction compared to existing implicit methods.

Implications and Future Directions

PointAvatar's advancements have meaningful implications for applications in communication and entertainment, particularly in environments like the metaverse. The clear differentiation between shading and albedo suggests potential avenues for enhanced relighting capabilities. While the approach effectively models intricate details, future research might explore disentangling surface reflectance properties to refine relighting. Moreover, adaptive point sizing could further reduce computational overhead while improving detail accuracy.

In conclusion, the paper provides compelling evidence for the efficacy of point-based representations in 3D avatar creation, challenging the current paradigms of explicit and implicit methods. Its efficiency in rendering and capacity to handle complex topologies represent significant strides in the evolution of digital human modeling.

PDF Markdown