Emergent Mind

Human101: Training 100+FPS Human Gaussians in 100s from 1 View

(2312.15258)
Published Dec 23, 2023 in cs.CV

Abstract

Reconstructing the human body from single-view videos plays a pivotal role in the virtual reality domain. One prevalent application scenario necessitates the rapid reconstruction of high-fidelity 3D digital humans while simultaneously ensuring real-time rendering and interaction. Existing methods often struggle to fulfill both requirements. In this paper, we introduce Human101, a novel framework adept at producing high-fidelity dynamic 3D human reconstructions from 1-view videos by training 3D Gaussians in 100 seconds and rendering in 100+ FPS. Our method leverages the strengths of 3D Gaussian Splatting, which provides an explicit and efficient representation of 3D humans. Standing apart from prior NeRF-based pipelines, Human101 ingeniously applies a Human-centric Forward Gaussian Animation method to deform the parameters of 3D Gaussians, thereby enhancing rendering speed (i.e., rendering 1024-resolution images at an impressive 60+ FPS and rendering 512-resolution images at 100+ FPS). Experimental results indicate that our approach substantially eclipses current methods, clocking up to a 10 times surge in frames per second and delivering comparable or superior rendering quality. Code and demos will be released at https://github.com/longxiang-ai/Human101.

A summary diagram of the Human101 methodology section.

Overview

  • Human101 introduces a method for dynamic 3D human reconstruction from single-view videos at fast speeds without sacrificing quality.

  • The framework uses 3D Gaussian Splatting to achieve 100 FPS rendering for high-resolution images in a short training time.

  • Technical innovations include Canonical Human Initialization, Human-centric Forward Gaussian Animation, and Human-centric Gaussian Refinement.

  • Validated by extensive testing on datasets, Human101 shows superior performance in training speed and rendering quality on a single GPU.

  • The tool offers significant contributions to real-time virtual human modeling, promising advancements in virtual reality and digital human animation.

Introduction to Human101 Framework

In the realm of creating digital avatars and virtual humans, there has been a pivotal advancement with the development of Human101—a framework capable of dynamic 3D human reconstruction from single-view videos at unprecedented speeds.

Speed and Quality in Human Modeling

Human101 is groundbreaking in presenting a technique that not merely accelerates the generation of virtual humans but does so without compromising the quality of the avatars produced. It achieves this by training a model known as 3D Gaussian Splatting (3D GS) within a timeframe of 100 seconds and rendering detailed images at over 100 frames per second (for 512×512 resolution images).

Technical Innovations

The framework's efficiency and quality rest on a set of technical innovations:

  • Canonical Human Initialization: This step crucially speeds up the training process by using an advanced monocular reconstruction method to extract point clouds from video frames. These point clouds are then fused to initialize the model, providing a strong starting point that resembles a canonical pose of the human subject.
  • Human-centric Forward Gaussian Animation: By deforming parameters of 3D Gaussians, Human101 directly manages the representation of the human figure in various poses, shifting away from traditional methods (like inverse skinning). This approach allows for a significant boost in rendering speed by handling fewer parameters with higher efficiency.
  • Human-centric Gaussian Refinement: Beyond initialization and animation, Human101 applies refinements to the positions, rotations, scales, and view directions of the Gaussians. This process captures dynamic human nuances, such as subtle movements and deformations, ensuring high-fidelity reconstruction.

Experimental Validation

Extensive tests on datasets including ZJU-MoCap and MonoCap demonstrate that Human101 not only constructs a dynamic human swiftly but also maintains an incredible rendering speed coupled with better visual quality. The framework operates fluently on a single RTX 3090 GPU, setting a remarkably high bar for real-time interactive applications and immersive experiences in virtual environments.

Contributions

Human101’s contributions to the field are substantial:

  • It offers a highly efficient and explicit representation method for dynamic 3D human modeling that significantly outperforms existing methods in terms of training speed and rendering performance.
  • The framework innovates with new methods for initializing, animating, and refining the Gaussian representations used to model the 3D human figures.
  • Its impressive rendering speed (>100 FPS at 512 resolution and >60 FPS at 1024 resolution) combined with high-fidelity visuals makes it a powerful tool for real-time applications.

Conclusion

Human101 is a transformative tool in the world of virtual reality and digital human animation, presenting an incredibly fast and high-quality solution for creating avatars from single-view videos. Its introduction holds promise for significant advancements in the accessibility and versatility of virtual human creation. With its release, Human101 is set to become an invaluable asset in the burgeoning field of virtual human modeling.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.