Emergent Mind

Deformable 3D Gaussian Splatting for Animatable Human Avatars

(2312.15059)
Published Dec 22, 2023 in cs.CV and cs.AI

Abstract

Recent advances in neural radiance fields enable novel view synthesis of photo-realistic images in dynamic settings, which can be applied to scenarios with human animation. Commonly used implicit backbones to establish accurate models, however, require many input views and additional annotations such as human masks, UV maps and depth maps. In this work, we propose ParDy-Human (Parameterized Dynamic Human Avatar), a fully explicit approach to construct a digital avatar from as little as a single monocular sequence. ParDy-Human introduces parameter-driven dynamics into 3D Gaussian Splatting where 3D Gaussians are deformed by a human pose model to animate the avatar. Our method is composed of two parts: A first module that deforms canonical 3D Gaussians according to SMPL vertices and a consecutive module that further takes their designed joint encodings and predicts per Gaussian deformations to deal with dynamics beyond SMPL vertex deformations. Images are then synthesized by a rasterizer. ParDy-Human constitutes an explicit model for realistic dynamic human avatars which requires significantly fewer training views and images. Our avatars learning is free of additional annotations such as masks and can be trained with variable backgrounds while inferring full-resolution images efficiently even on consumer hardware. We provide experimental evidence to show that ParDy-Human outperforms state-of-the-art methods on ZJU-MoCap and THUman4.0 datasets both quantitatively and visually.

Overview

  • Introduces ParDy-Human, a novel approach for generating animatable 3D human avatars with minimal input by using deformable 3D Gaussian Splatting.

  • ParDy-Human employs fewer camera views and does not require human segmentation masks or other complex annotations for model training.

  • The method uses SMPL for deforming canonical 3D Gaussians and predicts deformations beyond simple vertex manipulations to animate avatars.

  • Demonstrates efficiency in generating high-resolution avatar renderings on consumer-grade hardware with both dense and sparse input images.

  • Acknowledges limitations and ethical considerations but provides a significant step forward in reduced-data avatar generation and animation.

Introduction to 3D Avatars and Rendering

Creating realistic 3D human avatars from images is a significant task in visual media with applications spanning animation, virtual reality, and interactive gaming. Traditionally, generating animatable avatars has been a complex task requiring numerous camera viewpoints and particular annotations including human masks, UV maps, and depth maps.

Pioneering a New Avatar Approach

The paper introduces ParDy-Human, a novel explicit approach to generate animatable human avatars with minimal input requirements. Existing solutions may depend on dense camera views and complex annotations; ParDy-Human requires considerably fewer inputs to accomplish the task. It does so by introducing the concept of deformable 3D Gaussian Splatting, which modifies 3D Gaussians in accordance with a human pose model to animate avatars. It integrates two primary parts: The first module deals with deforming canonical 3D Gaussians following joint encodings assigned by the SMPL (Skinned Multi-Person Linear Model), while the second predicts deformations that account for dynamics beyond simple vertex manipulations.

Model Training and Efficiency

ParDy-Human can be effectively trained without human segmentation masks, using significantly fewer camera views than previous methods. Experimental evidence demonstrates its proficiency in generating realistic avatars from both densely and sparsely captured input images. Of particular note, the method is capable of generating high-resolution renderings efficiently on consumer-grade hardware.

Innovations and Contributions

This work's contributions are manifold. A powerful method for deformable 3D Gaussian splatting results in a parametrized, fully explicit representation for dynamic human avatar animation with reduced training data needs. The approach significantly accelerates inference, allowing for full-resolution renderings quickly and effectively. While highly resourceful, the paper also acknowledges limitations such as the potential for artifacts on uni-colored garments and the ethical concerns associated with digital human replication. The proposed framework nonetheless offers an intriguing and novel pathway for producing animatable human representations, setting the stage for future research and applications in the domain of avatar generation and visual effects.

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.