Emergent Mind

Abstract

We propose DoubleFusion, a new real-time system that combines volumetric dynamic reconstruction with data-driven template fitting to simultaneously reconstruct detailed geometry, non-rigid motion and the inner human body shape from a single depth camera. One of the key contributions of this method is a double layer representation consisting of a complete parametric body shape inside, and a gradually fused outer surface layer. A pre-defined node graph on the body surface parameterizes the non-rigid deformations near the body, and a free-form dynamically changing graph parameterizes the outer surface layer far from the body, which allows more general reconstruction. We further propose a joint motion tracking method based on the double layer representation to enable robust and fast motion tracking performance. Moreover, the inner body shape is optimized online and forced to fit inside the outer surface layer. Overall, our method enables increasingly denoised, detailed and complete surface reconstructions, fast motion tracking performance and plausible inner body shape reconstruction in real-time. In particular, experiments show improved fast motion tracking and loop closure performance on more challenging scenarios.

Overview

  • DoubleFusion is a system that captures human performances, providing detailed geometry and motion of the body, using one depth sensor.

  • The system combines volumetric reconstruction with template fitting to simultaneously track the inner body shape and outer surface.

  • A double layer representation allows for improved modeling by informing both the inner structure and outer surface reconstructions.

  • DoubleFusion performs real-time tracking and reconstruction of complex movements and clothing, outperforming traditional methods.

  • It runs efficiently on consumer-grade hardware, suggesting practical applications in entertainment and virtual clothing fitting.

Introduction

The paper introduces DoubleFusion, a system designed to capture and reconstruct human performances in real-time through a single depth sensor such as those found in consumer-grade devices. DoubleFusion employs a novel approach by integrating volumetric dynamic reconstruction with template fitting techniques to extract detailed surface geometry, motion, and the inner shape of the human body simultaneously.

Technical Contributions

The authors of the paper describe several key contributions to the field of human performance capture:

  • A double layer representation includes a parametric model of the inner human body and a non-rigid, gradually fused outer surface layer. This representation leverages information from the inner layer to inform the reconstruction of the outer layer and vice versa.
  • A joint motion tracking algorithm that accounts for the pose of the inner shape, as well as the non-rigid deformations of the outer surface. The method optimizes for both using feature correspondences enhanced by the double layer representation.
  • An innovative volumetric shape-pose optimization process that fits the parametric model parameters within the outer surface layer, without the need for pre-scanned model templates.

System Pipeline

The DoubleFusion system pipeline consists of several stages, each aimed at different aspects of the performance capture process:

  • Initialization: requiring the subject to start in a rough A-pose and utilizing the first frame of depth data to set up initial parameters.
  • Joint Motion Tracking: optimizes both the pose of the inner model and non-rigid deformations of the outer surface using the double-layer representation.
  • Geometric Fusion: incorporates depth data from multiple frames into a reference volume to build more complete surface geometry.
  • Volumetric Shape-Pose Optimization: refines the parameters of the inner body model to better align with observed data in the updated reference volume.

Results and Comparison

DoubleFusion shows notable improvements over state-of-the-art methods. When compared to systems that only reconstruct the outer surface, DoubleFusion provides significantly better handling of fast motions and loop closures. Its real-time reconstruction of the inner body shape achieves plausible and detailed results in various scenarios, including those with challenging clothing and dynamic movements.

The system performance is evaluated both qualitatively and quantitatively, with data and visual results demonstrating its ability to efficiently track and reconstruct in real-time. DoubleFusion runs at 32ms per frame on an NVIDIA TITAN X GPU, indicating its suitability for practical consumer applications.

Conclusion

In conclusion, DoubleFusion is portrayed as a groundbreaking approach to real-time human performance capture, enabling detailed reconstruction of both clothing and the underlying human body shape with a single depth camera. The system's robustness and accuracy open up possibilities for a broad range of applications, from entertainment to virtual try-ons, that were previously infeasible with existing methods.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.