Emergent Mind

IMUSIC: IMU-based Facial Expression Capture

(2402.03944)
Published Feb 3, 2024 in cs.CV

Abstract

For facial motion capture and analysis, the dominated solutions are generally based on visual cues, which cannot protect privacy and are vulnerable to occlusions. Inertial measurement units (IMUs) serve as potential rescues yet are mainly adopted for full-body motion capture. In this paper, we propose IMUSIC to fill the gap, a novel path for facial expression capture using purely IMU signals, significantly distant from previous visual solutions.The key design in our IMUSIC is a trilogy. We first design micro-IMUs to suit facial capture, companion with an anatomy-driven IMU placement scheme. Then, we contribute a novel IMU-ARKit dataset, which provides rich paired IMU/visual signals for diverse facial expressions and performances. Such unique multi-modality brings huge potential for future directions like IMU-based facial behavior analysis. Moreover, utilizing IMU-ARKit, we introduce a strong baseline approach to accurately predict facial blendshape parameters from purely IMU signals. Specifically, we tailor a Transformer diffusion model with a two-stage training strategy for this novel tracking task. The IMUSIC framework empowers us to perform accurate facial capture in scenarios where visual methods falter and simultaneously safeguard user privacy. We conduct extensive experiments about both the IMU configuration and technical components to validate the effectiveness of our IMUSIC approach. Notably, IMUSIC enables various potential and novel applications, i.e., privacy-protecting facial capture, hybrid capture against occlusions, or detecting minute facial movements that are often invisible through visual cues. We will release our dataset and implementations to enrich more possibilities of facial capture and analysis in our community.

Overview

  • IMUSIC leverages Inertial Measurement Units (IMUs) for non-visual motion capture of facial expressions, addressing privacy and occlusion issues.

  • The research includes the creation of micro-IMUs for anatomically strategic placement on the face, alongside a novel IMU-ARKit dataset.

  • A two-stage training strategy is proposed for predicting facial blendshape parameters from inertial data to overcome data scarcity and improve model generalizability.

  • Extensive experiments validate the effectiveness of IMUSIC over visual-based techniques, with substantial applications for anonymized animation and scenarios with occlusions.

Introduction to IMUSIC

In the realm of motion capture technology for facial expressions, the common methods have largely been reliant on visual data. However, visual-based systems have limitations such as privacy concerns and vulnerability to occlusions. The research work on IMUSIC introduces a non-visual alternative with distinctive approaches to address these issues.

The foundation of IMUSIC lies in employing Inertial Measurement Units (IMUs) in a novel manner – specifically designed for the facial anatomy. The core advantage of using IMUs is their ability to operate without visual input, thus overcoming occlusion problems and preserving privacy. The innovation does not stop with hardware design; the research extends into the creation of a unique dataset and the adaptation of a transformer diffusion model to predict facial expressions solely based on inertial data.

Design and Dataset

Central to IMUSIC is the development of micro-IMUs that are compact enough to be strategically placed on the face. The researchers propose an anatomy-driven placement scheme to ensure IMUs are located at positions that capture a comprehensive range of facial muscle movements without obstructing natural expressions. To support the research community, the team has assembled the IMU-ARKit dataset, which pairs IMU data with visual signals for facial expressions and performance.

The paper discusses a two-stage training strategy for predicting facial blendshape parameters from IMU data. The procedure involves pre-training on synthesized virtual IMU signals derived from existing visual datasets, followed by fine-tuning with actual paired IMU/visual signals. This approach is essential in addressing the challenges of data scarcity and model generalizability.

Experimental Insights

Extensive experiments showcase the effectiveness of the IMUSIC approach. The IMUs are tested for optimal placement on the face, ensuring the capture of movements with high signal-to-noise ratios. Through a series of numerical results, the paper confirms the prowess of their method in capturing nuanced facial movements and demonstrates its comparative effectiveness against state-of-the-art visual-based techniques like DECA and 3DDFA V2.

Practical Applications

IMUSIC proves to have significant applications, particularly in scenarios where privacy is paramount, or where visual methods struggle. The method offers a novel solution for Virtual YouTubers (VTubers) who wish to maintain anonymity while animating their avatars with real-time expressions. Additionally, IMUSIC shows great potential in hybrid capture scenarios, where it complements systems like ARKit in instances where the face might be partially obscured, as in professional voice recording environments. Furthermore, due to its sensitivity, IMUSIC can capture minute facial movements, such as subtle cheek puffs, which are typically invisible to camera-based systems.

Conclusion and Future Directions

IMUSIC posits itself as an innovative facial motion capture approach, not only due to its strategic use of IMUs but also through the creation of the IMU-ARKit dataset and implementation of neural network models tailored to inertial signals. The research opens doors to further exploration and potential enhancements in wearable designs, training strategies, and combining IMU data with other non-visual modalities for a more comprehensive facial capture. With an intention to contribute to community research, the dataset and codes of IMUSIC will be made publicly available, fostering advancement in this emergent field of non-visual facial motion capture technology.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.