Emergent Mind

Towards Open Respiratory Acoustic Foundation Models: Pretraining and Benchmarking

(2406.16148)
Published Jun 23, 2024 in cs.SD , cs.AI , cs.LG , and eess.AS

Abstract

Respiratory audio, such as coughing and breathing sounds, has predictive power for a wide range of healthcare applications, yet is currently under-explored. The main problem for those applications arises from the difficulty in collecting large labeled task-specific data for model development. Generalizable respiratory acoustic foundation models pretrained with unlabeled data would offer appealing advantages and possibly unlock this impasse. However, given the safety-critical nature of healthcare applications, it is pivotal to also ensure openness and replicability for any proposed foundation model solution. To this end, we introduce OPERA, an OPEn Respiratory Acoustic foundation model pretraining and benchmarking system, as the first approach answering this need. We curate large-scale respiratory audio datasets (~136K samples, 440 hours), pretrain three pioneering foundation models, and build a benchmark consisting of 19 downstream respiratory health tasks for evaluation. Our pretrained models demonstrate superior performance (against existing acoustic models pretrained with general audio on 16 out of 19 tasks) and generalizability (to unseen datasets and new respiratory audio modalities). This highlights the great promise of respiratory acoustic foundation models and encourages more studies using OPERA as an open resource to accelerate research on respiratory audio for health. The system is accessible from https://github.com/evelyn0414/OPERA.

System overview of OPERA: Pretraining respiratory acoustic models and evaluating on health tasks post data curation.

Overview

  • The paper proposes OPERA, an open respiratory acoustic foundation model, designed to improve health monitoring and disease detection by analyzing respiratory sounds like coughing and breathing.

  • Three models (OPERA-CT, OPERA-CE, and OPERA-GT) are pretrained using self-supervised learning techniques and benchmarked across 19 respiratory health tasks, outperforming traditional methods and other pretrained audio models on 16 out of 19 tasks.

  • Future directions include exploring data-efficient fine-tuning, investigating model scaling, and developing novel pretraining strategies to better handle the unique characteristics of respiratory audio data.

Towards Open Respiratory Acoustic Foundation Models: Pretraining and Benchmarking

This paper investigates the development and evaluation of respiratory acoustic foundation models. The authors recognize the significant potential of leveraging respiratory sounds—such as coughing and breathing—for health monitoring and disease detection. This potential spans a gamut of applications, including respiratory rate estimation, lung function analysis, detection of sleep apnea, assessment of smoking effects, and diagnosis of respiratory diseases like influenza and asthma.

Challenges in Existing Approaches

Traditional methods mainly rely on supervised deep learning models that necessitate extensive volumes of labeled data, which is often labor-intensive and costly to produce. Conventional signal processing techniques, despite their utility, have inherent limitations in their performance and usually demand domain expertise. Currently, open-source acoustic models like AudioMAE and CLAP utilize general audio event datasets, with a mere 0.3% representation of respiratory sounds, limiting their applicability to intricate respiratory sound variations. Moreover, despite the presentation of a non-open-source model pretrained on respiratory sounds recently, the closed nature of this endeavor restricts replication, analysis, and further development.

OPERA: Open Respiratory Acoustic Foundation Model

To counter these limitations, the authors propose OPERA—an open respiratory acoustic foundation model pretraining and benchmarking system. OPERA is a comprehensive system that amalgamates the curation of datasets, the pretraining of acoustic models, and a rigorous benchmarking process.

Data Curation

The authors curate a diverse and large-scale dataset comprising approximately 136K samples, equating to around 440 hours of respiratory audio data from five sources. These datasets cover various respiratory sound modalities, including breathing, coughing, and lung sounds, and are orders of magnitude larger compared to existing datasets used for training open acoustic models.

Model Pretraining

The paper introduces three pretrained models using different self-supervised learning (SSL) techniques:

  1. OPERA-CT: A contrastive learning-based transformer model.
  2. OPERA-CE: A contrastive learning-based CNN model.
  3. OPERA-GT: A generative pretrained transformer model.

These approaches were chosen to leverage large-scale unlabeled data for learning meaningful representations, aiming to enhance the models' transferability and applicability to supervised fine-tuning tasks.

Benchmarking

The paper extensively benchmarks these pretrained models across 19 respiratory health tasks, categorized into health condition inference and lung function estimation. These tasks utilize ten labeled respiratory audio datasets, of which six were unseen during pretraining, ensuring fair and robust evaluation of the models' generalizability.

Key Results and Findings

The findings demonstrate that the pretrained respiratory acoustic foundation models outperform traditional feature extraction methods and existing general audio pretrained models on 16 out of 19 tasks. Specific results include:

  • Health Condition Inference: The models surpass the 0.7 AUROC threshold on multiple tasks, indicating high utility in discriminating health conditions.
  • Lung Function Estimation: The generative pretrained models exhibit lower MAE, particularly for tasks requiring global feature extraction.

Among the three models, OPERA-CT excels in classification-based tasks, while OPERA-GT performs robustly in regression tasks. The superior performance of transformer-based models compared to CNN models underscores the efficacy of these architectures in handling respiratory sound variations, albeit with higher computational demands.

Implications and Future Directions

The paper introduces a paradigm shift in employing foundation models for respiratory health applications, showcasing their potential to streamline diagnostic processes and personalized health monitoring. The following future directions are highlighted:

  1. Fine-Tuning: Exploring data-efficient fine-tuning methods tailored for audio models could bridge the gap between limited labeled data and extensive model capabilities.
  2. Scaling Laws: Investigating the scaling of model size and pretraining data volume to further enhance performance, particularly as more respiratory audio datasets become available.
  3. Novel Pretraining Strategies: Advancing SSL techniques specifically adapted to the unique challenges of respiratory audio, including heterogeneous sound types and complex temporal-frequency correlations.

Conclusion

The introduction of OPERA marks a significant step toward the development of open-source, generalizable respiratory acoustic foundation models. The system not only provides a comprehensive dataset and a robust benchmarking framework but also elucidates the strengths and limitations of various pretraining approaches. This foundational work paves the way for future exploration and application of machine learning in respiratory health monitoring, potentially transforming the landscape of personalized healthcare.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.