Transformer Inertial Poser: Real-time Human Motion Reconstruction from Sparse IMUs with Simultaneous Terrain Generation (2203.15720v3)

Published 29 Mar 2022 in cs.CV and cs.GR

Abstract: Real-time human motion reconstruction from a sparse set of (e.g. six) wearable IMUs provides a non-intrusive and economic approach to motion capture. Without the ability to acquire position information directly from IMUs, recent works took data-driven approaches that utilize large human motion datasets to tackle this under-determined problem. Still, challenges remain such as temporal consistency, drifting of global and joint motions, and diverse coverage of motion types on various terrains. We propose a novel method to simultaneously estimate full-body motion and generate plausible visited terrain from only six IMU sensors in real-time. Our method incorporates 1. a conditional Transformer decoder model giving consistent predictions by explicitly reasoning prediction history, 2. a simple yet general learning target named "stationary body points" (SBPs) which can be stably predicted by the Transformer model and utilized by analytical routines to correct joint and global drifting, and 3. an algorithm to generate regularized terrain height maps from noisy SBP predictions which can in turn correct noisy global motion estimation. We evaluate our framework extensively on synthesized and real IMU data, and with real-time live demos, and show superior performance over strong baseline methods.

Citations (58)

View on Semantic Scholar

Summary

The paper presents a novel Transformer decoder that reconstructs full-body motion from only six IMU sensors while addressing joint drift.
It introduces Stationary Body Points (SBP) to stabilize predictions and enhance joint accuracy in motion reconstruction.
The system simultaneously generates regularized terrain maps in real-time, improving contextual motion capture across diverse environments.

An Overview of Transformer Inertial Poser: Real-time Human Motion Reconstruction

Introduction

The paper "Transformer Inertial Poser: Real-time Human Motion Reconstruction from Sparse IMUs with Simultaneous Terrain Generation" introduces a novel approach to motion capture using only six inertial measurement units (IMUs). This approach addresses challenges in existing systems related to temporal consistency, joint drift, and terrain diversity. The method capitalizes on the Transformer model's capabilities to provide precise and consistent predictions, while also generating plausible terrain maps in real-time.

Methodology

The primary contribution of this research is the introduction of a system referred to as the Transformer Inertial Poser (TIP). TIP takes a data-driven approach to derive full-body motion and terrain mapping from sparse IMU data. The major components of this system include:

Conditional Transformer Decoder Model: This model processes input from the IMU sensors and predicts future motions by incorporating historical predictions, enhancing the accuracy of motion estimation.
Stationary Body Points (SBP): The paper introduces SBPs, which are predicted points on the body presumed to have negligible velocity. SBPs play a critical role in addressing joint and root drifting issues often encountered in IMU-based systems.
Terrain Generation Algorithm: This algorithm constructs regularized terrain height maps from noisy SBP data, which can further refine the motion estimation by aligning it with physical constraints.

The system’s ability to generate human motion and the corresponding terrain in real-time represents a significant advancement over traditional IMU-based methods that often lack spatial context.

Results and Evaluation

The authors conducted extensive evaluations using both synthetic and real datasets, demonstrating TIP's robustness and superiority over existing baseline methods. The results highlight notable improvements in:

Joint Angle and Position Accuracy: Superior mean joint angle and position error metrics showcase TIP's precision in reconstructing joint movements.
Root Translation and Jitter: Through quantitative analysis, TIP achieved lower root translation errors and reduced position jitter, indicating more stable motion capture.
Real-Time Application Performance: The system's performance in real-time scenarios was validated through live demonstrations, ensuring practicality for various applications.

Implications

The advancements introduced by TIP could impact various domains utilizing motion capture, including virtual reality, biomechanics, and sports science. The ability to generate motion and terrain concurrently not only enhances motion capture systems’ ability to interpret dynamic environments but also opens up possibilities for applications involving complex terrain interactions.

Future Directions

Future work may involve enhancing the robustness of TIP by training on more diverse, real-world datasets including varied terrains and motion types. Personalization of the model for individual users could improve system accuracy across different body types and motion styles. Additionally, integrating broader environmental context into terrain generation could generate more realistic terrains in ambiguous scenarios.

Overall, the Transformer Inertial Poser represents a significant step forward in the field of motion capture technology, leveraging advanced machine learning techniques to efficiently and effectively address longstanding challenges in IMU-based systems.

PDF Markdown

Related Papers

YouTube

Show All Videos