We present a lightweight and affordable motion capture method based on two smartwatches and a head-mounted camera. In contrast to the existing approaches that use six or more expert-level IMU devices, our approach is much more cost-effective and convenient. Our method can make wearable motion capture accessible to everyone everywhere, enabling 3D full-body motion capture in diverse environments. As a key idea to overcome the extreme sparsity and ambiguities of sensor inputs with different modalities, we integrate 6D head poses obtained from the head-mounted cameras for motion estimation. To enable capture in expansive indoor and outdoor scenes, we propose an algorithm to track and update floor level changes to define head poses, coupled with a multi-stage Transformer-based regression module. We also introduce novel strategies leveraging visual cues of egocentric images to further enhance the motion capture quality while reducing ambiguities. We demonstrate the performance of our method on various challenging scenarios, including complex outdoor environments and everyday motions including object interactions and social interactions among multiple individuals.

Positioning of IMU sensors and cameras for calibration and coordinate alignment in the depicted research.


  • The paper discusses how traditional motion capture technology requires many sensors and controlled environments, making it inaccessible and limited.

  • The proposed method uses just two smartwatches and a head-mounted camera to make motion capture more affordable and versatile, both indoors and outdoors.

  • The system employs a multi-stage Transformer-based module and algorithms that calibrate head poses against varying floor levels for precise movement estimation.

  • Visual cues from the head-mounted camera complement the sensor data, helping to resolve ambiguities and improve the capture of complex motions.

  • This research offers a cost-effective approach to motion capture that could benefit fields like sports, healthcare, and media, expanding the horizons for analyzing and creating human motion in 3D.

Introduction to Motion Capture Technology

Motion capture technology is vital for replicating the intricacies of human movement in virtual environments, films, and interactive systems. The traditional way of capturing motion often necessitates a significant number of sensors and a controlled environment, which limits accessibility and convenience. Moreover, acquiring comprehensive motion capture data that accurately reflects varied real-world scenarios can be challenging due to the need for expert equipment and complex settings, leaving researchers with relatively limited data compared to other domains like imagery and language.

Democratization of Motion Capture

Aiming to make motion capture more accessible, this work proposes a novel method that requires only a head-mounted camera and two smartwatches. This setup dramatically reduces the cost and complexity of capturing motion and does not require the individual to be in a specific location. With the advent of smartwatches and wearable cameras, motion capture can now be conducted indoors or outdoors, capturing a wide variety of movements, from daily interactions to various outdoor activities.

Enhancing Motion Capture with Smart Technology

To cope with the sparsity of data that comes from using only two smartwatches, the system includes an algorithm that updates floor levels to calibrate head poses. This ensures accurate motion capture even on uneven terrains like stairs or hills. A multi-stage Transformer-based module processes the sensor data, enabling the precise estimation of movement.

Additionally, the method uses visual cues from the head-mounted camera to resolve ambiguities that traditional IMU sensors face. These visual cues are beneficial in scenarios where objects are being handled or during social interactions, as they help disambiguate the captured movements by providing visual context.

Contributions and Potential Applications

The research contributes notably to the field of motion capture in several ways. It presents the first method capable of high-quality full-body motion capture using consumer-level devices, tracks and updates floor levels in a wide range of environments, and optimizes motion capture quality by leveraging visual information from a head-mounted camera.

With its lightweight and affordable solution, the proposed method is a step towards democratizing motion capture technology, making it possible for researchers to explore new areas with motion capture data and enabling a wider audience to create detailed 3D animations with everyday devices. This breakthrough can potentially revolutionize fields like sports analysis, healthcare, animation, and interactive media, opening up new possibilities for understanding and generating human motion data in natural settings.

