ICASSP 2022 Deep Noise Suppression Challenge (2202.13288v1)

Published 27 Feb 2022 in eess.AS and cs.SD

Abstract: The Deep Noise Suppression (DNS) challenge is designed to foster innovation in the area of noise suppression to achieve superior perceptual speech quality. This is the 4th DNS challenge, with the previous editions held at INTERSPEECH 2020, ICASSP 2021, and INTERSPEECH 2021. We open-source datasets and test sets for researchers to train their deep noise suppression models, as well as a subjective evaluation framework based on ITU-T P.835 to rate and rank-order the challenge entries. We provide access to DNSMOS P.835 and word accuracy (WAcc) APIs to challenge participants to help with iterative model improvements. In this challenge, we introduced the following changes: (i) Included mobile device scenarios in the blind test set; (ii) Included a personalized noise suppression track with baseline; (iii) Added WAcc as an objective metric; (iv) Included DNSMOS P.835; (v) Made the training datasets and test sets fullband (48 kHz). We use an average of WAcc and subjective scores P.835 SIG, BAK, and OVRL to get the final score for ranking the DNS models. We believe that as a research community, we still have a long way to go in achieving excellent speech quality in challenging noisy real-world scenarios.

Citations (174)

View on Semantic Scholar

Summary

The paper provides an overview of the ICASSP 2022 Deep Noise Suppression (DNS) Challenge, detailing its objectives, methodology, and contributions to advancing deep learning-based noise suppression.
Key methodological adaptations in the 2022 challenge included mobile device scenarios, personalized DNS tracks using speaker profiles, and expanded objective metrics like DNSMOS P.835 and Word Accuracy (WAcc).
Results highlighted varied successful approaches and device-specific challenges, informing future research directions aimed at improving speech quality, handling multifaceted audio data, and optimizing computational efficiency.

Overview and Insights of the ICASSP 2022 Deep Noise Suppression Challenge

The ICASSP 2022 Deep Noise Suppression (DNS) Challenge represents a concerted effort by Microsoft researchers to push the envelope in noise suppression technology, aiming to improve perceptual speech quality in challenging environments. Since the initiation of ICASSP and INTERSPEECH-based challenges in 2020, the DNS Challenge series has emerged as a pivotal platform contributing extensively to advancements in deep learning-based noise suppression models.

Crucial Adaptations and Methodology

The 2022 challenge made significant methodological adaptations, such as including mobile device scenarios, introducing a personalized noise suppression track, leveraging DNSMOS P.835 for score predictions, and adopting Word Accuracy (WAcc) as an objective metric among others. These changes broaden the scope and applicability of research outputs, ultimately refining DNS models to ensure real-world robustness.

Challenge Tracks

Two primary tracks were detailed: non-personalized DNS and personalized DNS for fullband audio. The explicit integration of speaker profiles within personalized DNS accounts for nuanced scenarios where neighboring talkers exacerbate noise conditions.

The personalized track employed a distinctive dossier: participants utilized enroLLMent speech data across a diversely sampled speaker dataset, with each speaker offering a predefined clean speech segment for model adaptation efforts. The use of speaker embeddings was advocated, notably leveraging baseline models like RawNet2 for wideband audio.

Dataset Composition and Evaluation Framework

The dataset intricately comprised English and multilingual audio clips, juxtaposed against manifold real-world noise samples, fulfilling the challenge’s objective of fostering adaptive models poised to tackle dynamic acoustic environments. The test sets were bifurcated into development and blind test sets, accentuating model generalization upon unseen data through five-day-delayed live testing.

The evaluation methodology remained consistent with preceding settings, employing ITU-T P.835 criterion alongside WAcc to holistically assess speech and background noise quality improvements. Aero-scoring of performance was executed following a systematic ranking metric aggregating both objective and subjective indices.

Results and Implications

Submissions showcased varied approaches with the top models optimizing speech intelligibility and noise suppression even under heterogeneous conditions. The analyses of mobile versus desktop data offered insight into device-specific challenges, invoking potential optimizations for asymmetrical audio processing.

Future Directions

Acknowledging the strides made, the paper notes an uncharted path in precise speech quality attainment, epitomizing the ongoing nature of DNS research. The subsequent challenges aim to integrate multifaceted audio data—languages, accents, diverse devices—to simulate pragmatic use cases, accompanied by stringent model profiling for computational efficiencies.

Conclusions

The ICASSP 2022 DNS Challenge enriches the auditory machine learning domain with data-intensive evaluations, sets methodological benchmarks, and propels future research leveraging deep learning paradigms for enhanced real-world communication systems. Researchers are equipped with comprehensive datasets, reference models, and evaluation setups inspiring further innovation in DNS applications. Through collective insights and detailed evaluations highlighted in this piece, the underlying research predicates a trajectory of continued advancement in AI capabilities applied to auditory-focused tasks.