Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
157 tokens/sec
GPT-4o
43 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Remixed2Remixed: Domain adaptation for speech enhancement by Noise2Noise learning with Remixing (2312.16836v1)

Published 28 Dec 2023 in cs.SD and eess.AS

Abstract: This paper proposes Remixed2Remixed, a domain adaptation method for speech enhancement, which adopts Noise2Noise (N2N) learning to adapt models trained on artificially generated (out-of-domain: OOD) noisy-clean pair data to better separate real-world recorded (in-domain) noisy data. The proposed method uses a teacher model trained on OOD data to acquire pseudo-in-domain speech and noise signals, which are shuffled and remixed twice in each batch to generate two bootstrapped mixtures. The student model is then trained by optimizing an N2N-based cost function computed using these two bootstrapped mixtures. As the training strategy is similar to the recently proposed RemixIT, we also investigate the effectiveness of N2N-based loss as a regularization of RemixIT. Experimental results on the CHiME-7 unsupervised domain adaptation for conversational speech enhancement (UDASE) task revealed that the proposed method outperformed the challenge baseline system, RemixIT, and reduced the blurring of performance caused by teacher models.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (29)
  1. P. Ochieng, “Deep neural network techniques for monaural speech enhancement: State of the art analysis,” arXiv preprint arXiv:2212.00369, 2022.
  2. C. Macartney, and T. Weyde, “Improved speech enhancement with the wave-u-net,” arXiv preprint arXiv:1811.11307, 2018.
  3. A. Défossez, G. Synnaeve, and Y. Adi, “Real Time Speech Enhancement in the Waveform Domain,” in Proc. Interspeech, pp. 3291–3295, 2020.
  4. Y. Luo and N. Mesgarani, “Conv-TasNet: Surpassing ideal time-frequency magnitude masking for speech separation,” IEEE/ACM Trans. ASLP, vol. 27, no. 8, pp. 1256–1266, 2019.
  5. E. Tzinis, Z. Wang, and P. Smaragdis, “Sudo RM -RF: Efficient networks for universal audio source separation,” in Proc. MLSP, pp. 1–6, 2020.
  6. S. Zhao, T. H. Nguyen, and B. Ma, “Monaural speech enhancement with complex convolutional block attention module and joint time frequency losses,” in Proc. ICASSP, pp. 6648–6652, 2021.
  7. N. Ito and M. Sugiyama, “Audio Signal Enhancement with Learning from Positive and Unlabeled Data,” in Proc. ICASSP, pp. 1–5, 2023.
  8. A. S. Subramanian, X. Wang, M. K. Baskar, S. Watanabe, T. Taniguchi, D. Tran, and Y. Fujita, “Speech enhancement using end-to-end speech recognition objectives,” in Proc. WASPAA, pp. 234–238, 2019.
  9. S. W. Fu, C. Yu, K. H. Hung, M. Ravanelli, and Y. Tsao, “MetricGAN-U: Unsupervised speech enhancement/dereverberation based only on noisy/reverberated speech,” in Proc. ICASSP, pp. 7412–7416, 2022.
  10. S. Wisdom, E. Tzinis, H. Erdogan, R. Weiss, K. Wilson, and J. Hershey, “Unsupervised sound separation using mixture invariant training,” in proc. Adv. NIPS, 33, pp. 3846–3857, 2020.
  11. K. Saijo, and T. Ogawa, “Self-Remixing: Unsupervised Speech Separation VIA Separation and Remixing,” in Proc. ICASSP, pp. 1–5, 2023.
  12. C. F. Liao, Y. Tsao, H. Y. Lee, and H. M. Wang, “Noise Adaptive Speech Enhancement Using Domain Adversarial Training,” in Proc. Interspeech, pp. 3148–3152, 2019.
  13. H. Y. Lin, H. H. Tseng, X. Lu, and Y. Tsao, “Unsupervised noise adaptive speech enhancement by discriminator-constrained optimal transport,” in Proc. Adv. NIPS, 34, pp. 19935–19946, 2021.
  14. E. Tzinis, Y. Adi, V. K. Ithapu, B. Xu, P. Smaragdis, and A. Kumar, A, “Remixit: Continual self-training of speech enhancement models via bootstrapped remixing,” IEEE JSTSP, vol. 16, no. 6, pp. 1329–1341, 2022.
  15. J. Lehtinen, J. Munkberg, J. Hasselgren, S. Laine, T. Karras, M. Aittala, and T. Aila, “Noise2Noise: Learning Image Restoration without Clean Data,” in Proc. PMLR pp. 2965–2974, 2018.
  16. M. M. Kashyap, A. Tambwekar, K. Manohara, and S. Natarajan, “Speech Denoising Without Clean Training Data: A Noise2Noise Approach,” in Proc. Interspeech, pp. 2716–2720, 2021.
  17. N. Moran, D. Schmidt, Y. Zhong, and P. Coady, “Noisier2noise: Learning to denoise from unpaired noisy data,” in Proc. CVPR, pp. 12064–12072, 2020.
  18. T. Pang, H. Zheng, Y, Quan, and H. Ji, “Recorrupted-to-recorrupted: Unsupervised deep learning for image denoising,” in Proc. CVPR, pp. 2043–2052, 2021.
  19. T. Fujimura, Y. Koizumi, K. Yatabe, and R. Miyazaki, “Noisy-target training: A training strategy for DNN-based speech enhancement without clean speech,” in Proc. EUSIPCO, pp. 436–440, 2021.
  20. A. Sivaraman, S. Kim, and M. Kim, “Personalized speech enhancement through self-supervised data augmentation and purification,” in Proc. Interspeech, pp. 2676–2680, 2021.
  21. T. Fujimura and T. Toda, “Analysis Of Noisy-Target Training For Dnn-Based Speech Enhancement,” in Proc. ICASSP, pp. 1–5, 2023.
  22. S. Leglaive, L. Borne, E. Tzinis, M. Sadeghi, M. Fraticelli, S. Wisdom, M. Pariente, D. Pressnitzer, and J. R. Hershey, “The CHiME-7 UDASE task: Unsupervised domain adaptation for conversational speech enhancement,” arXiv preprint arXiv:2307.03533, 2023.
  23. Website of CHiME-7 Task 2 UDASE: https://www.chimechallenge.org/current/task2/index (last access: Sep. 4, 2023)
  24. J. Cosentino, M. Pariente, S. Cornell, A. Deleforge, and E. Vincent, “LibriMix: An open-source data set for generalizable speech separation,” arXiv preprint arXiv:2005.11262,2020.
  25. V. Panayotov, G. Chen, D. Povey, and S. Khudanpur, “LibriSpeech: an ASR corpus based on public domain audio books,” in Proc. ICASSP, pp. 5206–5210, 2015.
  26. G. Wichern, J. Antognini, M. Flynn, L. R. Zhu, E. McQuinn, D. Crow, E. Manilow, and J. LeRoux, “WHAM!: Extending speech separation to noisy environments,” in Proc. Interspeech, pp. 1368–1372, 2019.
  27. : J. Barker, S. Watanabe, E. Vincent, and J. Trmal, “The fifth ’CHiME’ speech separation and recognition challenge: Dataset, task and baselines,” in Proc. Interspeech, pp. 1561–1565, 2018.
  28. J. LeRoux, S. Wisdom, H. Erdogan, and J. R. Hershey, “SDR–half-baked or well done?” in Proc. ICASSP, pp. 626–630, 2019.
  29. C. K. Reddy, V. Gopal, and R. Cutler, “DNSMOS P.835: A non-intrusive perceptual objective speech quality metric to evaluate noise suppressors,” in Proc. ICASSP, pp. 886–890, 2022.
Citations (1)

Summary

We haven't generated a summary for this paper yet.