Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
60 tokens/sec
GPT-4o
12 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

LC4SV: A Denoising Framework Learning to Compensate for Unseen Speaker Verification Models (2311.16604v1)

Published 28 Nov 2023 in eess.AS and cs.LG

Abstract: The performance of speaker verification (SV) models may drop dramatically in noisy environments. A speech enhancement (SE) module can be used as a front-end strategy. However, existing SE methods may fail to bring performance improvements to downstream SV systems due to artifacts in the predicted signals of SE models. To compensate for artifacts, we propose a generic denoising framework named LC4SV, which can serve as a pre-processor for various unknown downstream SV models. In LC4SV, we employ a learning-based interpolation agent to automatically generate the appropriate coefficients between the enhanced signal and its noisy input to improve SV performance in noisy environments. Our experimental results demonstrate that LC4SV consistently improves the performance of various unseen SV systems. To the best of our knowledge, this work is the first attempt to develop a learning-based interpolation scheme aiming at improving SV performance in noisy environments.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (43)
  1. Philipos C. Loizou, Speech Enhancement: Theory and Practice, CRC Press, Inc., USA, 2nd edition, 2013.
  2. “A regression approach to speech enhancement based on deep neural networks,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2015.
  3. “Speech intelligibility potential of general and specialized deep neural network based speech enhancement systems,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2017.
  4. “Speech enhancement and recognition using multi-task learning of long short-term memory recurrent neural networks,” in Proc. Interspeech, 2015.
  5. “Wiener filtering based speech enhancement with weighted denoising auto-encoder and noise classification,” Speech Communication, 2014.
  6. “Experiments on deep learning for speech denoising,” in Proc. Interspeech, 2014.
  7. “Speech enhancement based on deep denoising autoencoder,” in Proc. Interspeech, 2013.
  8. “A deep ensemble learning method for monaural speech separation,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, pp. 967–977, 2016.
  9. “Dccrn: Deep complex convolution recurrent network for phase-aware speech enhancement,” in Proc. Interspeech, 2020.
  10. “SERIL: noise adaptive speech enhancement using regularization-based incremental learning,” in Proc. Interspeech, 2020.
  11. “Nastar: Noise adaptive speech enhancement with target-conditional resampling,” in Proc. Interspeech, 2022.
  12. “A new framework for supervised speech enhancement in the time domain.,” in Proc. Interspeech, 2018.
  13. Y. Ephraim and D. Malah, “Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, pp. 1109–1121, 1984.
  14. “SDR-half-baked or well done?,” in Proc. ICASSP, 2019.
  15. “Real time speech enhancement in the waveform domain,” in Proc. Interspeech, 2020.
  16. “Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs,” in Proc. ICASSP, 2001.
  17. “A short-time objective intelligibility measure for time-frequency weighted noisy speech,” in Proc. ICASSP, 2010.
  18. “Dnsmos: A non-intrusive perceptual objective speech quality metric to evaluate noise suppressors,” in Proc. ICASSP, 2021.
  19. “Generalization ability of mos prediction networks,” in Proc. ICASSP, 2022.
  20. “D4AM: A general denoising framework for downstream acoustic models,” in Proc. ICLR, 2023.
  21. “Espnet-se++: Speech enhancement for robust speech recognition, translation, and understanding,” in Proc. Interspeech, 2022.
  22. “Towards low-distortion multi-channel speech enhancement: The espnet-se submission to the l3das22 challenge,” in Proc. ICASSP, 2022.
  23. “Assessment of single-channel speech enhancement techniques for speaker identification under mismatched conditions,” in Proc. Interspeech, 2010.
  24. “Robust speaker recognition based on single-channel and multi-channel speech enhancement,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, pp. 1293–1302, 2020.
  25. “Voiceid loss: Speech enhancement for speaker verification,” in Proc. Interspeech, 2019.
  26. “Multi-channel speaker verification with conv-tasnet based beamformer,” in Proc. ICASSP, 2022.
  27. “Joint optimization of diffusion probabilistic-based multichannel speech enhancement with far-field speaker verification,” in Proc. SLT, 2023.
  28. “In defence of metric learning for speaker recognition,” in Proc. Interspeech, 2020.
  29. “Voxceleb: Large-scale speaker verification in the wild,” Computer Speech & Language, p. 101027, 2020.
  30. “Joint speaker encoder and neural back-end model for fully end-to-end automatic speaker verification with multiple enrollment utterances,” arXiv preprint arXiv:2209.00485, 2022.
  31. “Learning to enhance or not: Neural network-based switching of enhanced and observed signals for overlapping speech recognition,” in Proc. ICASSP, 2022.
  32. “Robust signal-to-noise ratio estimation based on waveform amplitude distribution analysis,” in Proc. Interspeech, 2008.
  33. “Human-level control through deep reinforcement learning,” Nature, pp. 529–33, 2015.
  34. “The interspeech 2020 deep noise suppression challenge: Datasets, subjective testing framework, and challenge results,” in Proc. Interspeech, 2020.
  35. “Librispeech: an asr corpus based on public domain audio books,” in Proc. ICASSP, 2015.
  36. “Librimix: An open-source dataset for generalizable speech separation,” in Proc. Interspeech, 2020.
  37. “The voices from a distance challenge 2019,” in Proc. Interspeech, 2019.
  38. “Multisv: Dataset for far-field multi-channel speaker verification,” in Proc. ICASSP, 2022.
  39. “Wham!: Extending speech separation to noisy environments,” arXiv preprint arXiv:1907.01160, 2019.
  40. “Pushing the limits of raw waveform speaker recognition,” in Proc. Interspeech, 2022.
  41. “ECAPA-TDNN: Emphasized Channel Attention, propagation and aggregation in TDNN based speaker verification,” in Proc. Interspeech, 2020.
  42. “Speechbrain: A general-purpose speech toolkit,” arXiv preprint arXiv:2106.04624, 2021.
  43. “Dual-signal transformation lstm network for real-time noise suppression,” in Proc. Interspeech, 2020.
Citations (1)

Summary

We haven't generated a summary for this paper yet.