Ultra Inertial Poser: Scalable Motion Capture and Tracking from Sparse Inertial Sensors and Ultra-Wideband Ranging (2404.19541v1)

Published 30 Apr 2024 in cs.CV, cs.AI, cs.GR, and eess.SP

Abstract: While camera-based capture systems remain the gold standard for recording human motion, learning-based tracking systems based on sparse wearable sensors are gaining popularity. Most commonly, they use inertial sensors, whose propensity for drift and jitter have so far limited tracking accuracy. In this paper, we propose Ultra Inertial Poser, a novel 3D full body pose estimation method that constrains drift and jitter in inertial tracking via inter-sensor distances. We estimate these distances across sparse sensor setups using a lightweight embedded tracker that augments inexpensive off-the-shelf 6D inertial measurement units with ultra-wideband radio-based ranging$-$dynamically and without the need for stationary reference anchors. Our method then fuses these inter-sensor distances with the 3D states estimated from each sensor Our graph-based machine learning model processes the 3D states and distances to estimate a person's 3D full body pose and translation. To train our model, we synthesize inertial measurements and distance estimates from the motion capture database AMASS. For evaluation, we contribute a novel motion dataset of 10 participants who performed 25 motion types, captured by 6 wearable IMU+UWB trackers and an optical motion capture system, totaling 200 minutes of synchronized sensor data (UIP-DB). Our extensive experiments show state-of-the-art performance for our method over PIP and TIP, reducing position error from $13.62$ to $10.65cm$ ($22\%$ better) and lowering jitter from $1.56$ to $0.055km/s^3$ (a reduction of $97\%$).

References (76)

Citations (6)

View on Semantic Scholar

Summary

The paper introduces a novel motion capture system combining sparse inertial sensors and UWB ranging, achieving a 22% improvement in pose estimation accuracy.
The methodology employs a dual-branch architecture with an LSTM for temporal dynamics and a distance attention graph network to fuse spatial constraints.
Empirical results demonstrate a 97% reduction in jitter and underline the approach’s scalability for applications in VR, gaming, and rehabilitation.

An Analysis of "Ultra Inertial Poser: Scalable Motion Capture and Tracking from Sparse Inertial Sensors and Ultra-Wideband Ranging"

The paper "Ultra Inertial Poser: Scalable Motion Capture and Tracking from Sparse Inertial Sensors and Ultra-Wideband Ranging" presents a novel method for tracking full-body motion using a combination of sparse inertial measurement units (IMUs) and ultra-wideband (UWB) radios to provide inter-sensor distances. The authors propose a sophisticated approach that fuses these disparate data streams using a graph-based neural network to accurately estimate human poses.

Methodological Advancements

The authors implement a new wearable sensing system, integrating 6DOF IMUs with UWB radios on compact wireless nodes. This setup allows for the estimation of orientation and acceleration and importantly introduces dynamic inter-sensor distance estimation using UWB ranging, circumventing the need for stationary anchors—a notable advancement from previous systems relying heavily on environment-embedded sensors.

For processing, the authors use a two-branch architecture: an LSTM network to capture temporal dynamics from IMU data and a Distance Attention Graph Convolutional Network (DA-GCN) to utilize inter-sensor distances. These branches are fused, allowing for a consistent estimation of sensor positions relative to the body, a crucial step in improving overall pose accuracy.

Empirical Evidence

The authors validate their methodology with a rigorously collected dataset (UIP-DB) of 10 participants performing varied motions. This includes 200 minutes of motion capture data synchronously collected from IMUs and UWB sensors alongside a 20-camera optical system for ground truth. This data supports a compelling claim of the paper: that the Ultra Inertial Poser reduces position errors and jitter significantly, with a reported 22% improvement over existing methods like PIP (Physical-Informed Pose) and TIP (Transformer-based Inertial Pose). Additionally, there is a 97% reduction in jitter, highlighting the method's effectiveness in producing smooth and accurate motion predictions.

Implications and Future Directions

The implications of this work are multifaceted. Practically, the scalability and affordability of this sensor setup allow for widespread adoption in fields such as virtual reality, gaming, and rehabilitation, offering a promising alternative to bulkier camera-based systems. Theoretically, this research emphasizes the importance of integrating spatial constraints via UWB ranging to enhance inertial sensor capabilities, which could push the boundaries of mobile and untethered motion capture systems.

Looking forward, potential advancements might focus on enhancing the robustness of UWB-based ranging in more complex environments and broader human activities. Additionally, continual refinement in machine learning models, perhaps through hybrid models leveraging other sensing technologies, could improve accuracy and reduce the computational footprint, making real-time applications in even more resource-constrained environments feasible.

Overall, the paper significantly contributes to human pose estimation by innovatively combining sparse sensor data with spatial constraints, offering a valuable toolkit for the next generation of scalable and robust motion capture technologies.

PDF Markdown

Tweets

https://twitter.com/cholz/status/1785610288043040922

YouTube

Show All Videos