Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 58 tok/s
Gemini 2.5 Pro 52 tok/s Pro
GPT-5 Medium 12 tok/s Pro
GPT-5 High 17 tok/s Pro
GPT-4o 95 tok/s Pro
Kimi K2 179 tok/s Pro
GPT OSS 120B 463 tok/s Pro
Claude Sonnet 4 38 tok/s Pro
2000 character limit reached

NoLACE: Improving Low-Complexity Speech Codec Enhancement Through Adaptive Temporal Shaping (2309.14521v2)

Published 25 Sep 2023 in eess.AS and cs.SD

Abstract: Speech codec enhancement methods are designed to remove distortions added by speech codecs. While classical methods are very low in complexity and add zero delay, their effectiveness is rather limited. Compared to that, DNN-based methods deliver higher quality but they are typically high in complexity and/or require delay. The recently proposed Linear Adaptive Coding Enhancer (LACE) addresses this problem by combining DNNs with classical long-term/short-term postfiltering resulting in a causal low-complexity model. A short-coming of the LACE model is, however, that quality quickly saturates when the model size is scaled up. To mitigate this problem, we propose a novel adatpive temporal shaping module that adds high temporal resolution to the LACE model resulting in the Non-Linear Adaptive Coding Enhancer (NoLACE). We adapt NoLACE to enhance the Opus codec and show that NoLACE significantly outperforms both the Opus baseline and an enlarged LACE model at 6, 9 and 12 kb/s. We also show that LACE and NoLACE are well-behaved when used with an ASR system.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (29)
  1. J.-H. Chen and A. Gersho, “Adaptive Postfiltering for Quality Enhancement of Coded Speech,” IEEE Transactions on Speech and Audio Processing, vol. 3, no. 1, pp. 59–71, 1995.
  2. “Convolutional Neural Networks to Enhance Coded Speech,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 27, no. 4, pp. 663–678, 2019.
  3. J. Skoglund and J.-M. Valin, “Improving Opus Low Bit Rate Quality with Neural Speech Synthesis,” in Proc. INTERSPEECH, 2019.
  4. “A DNN Based Post-Filter to Enhance the Quality of Coded Speech in MDCT Domain,” in Proc. International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022, pp. 836–840.
  5. “Enhancement of Coded Speech Using a Mask-Based Post-Filter,” in Proc. International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020, pp. 6764–6768.
  6. “PostGAN: A GAN-Based Post-Processor to Enhance the Quality of Coded Speech,” in Proc. International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022, pp. 831–835.
  7. “LACE: A light-weight, causal model for enhancing coded speech through adaptive convolutions,” in Proc. Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2023.
  8. J.-M. Valin and J. Skoglund, “A Real-Time Wideband Neural Vocoder at 1.6kb/s Using LPCNet,” in Proc. INTERSPEECH, 2019, pp. 3406–3410.
  9. “Generative Speech Coding with Predictive Variance Regularization,” in Proc. International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021, pp. 6478–6482.
  10. “SoundStream: An End-to-End Neural Audio Codec,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 30, pp. 495–507, 2022.
  11. “NESC: Robust Neural End-2-End Speech Coding with GANs,” in Proc. INTERSPEECH, 2022.
  12. “LMCodec: A Low Bitrate Speech Codec With Causal Transformer Models,” in Proc. International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023.
  13. “High Fidelity Neural Audio Compression,” 2022, arXiv:2210.13438.
  14. “Audiodec: An Open-Source Streaming High-Fidelity Neural Audio Codec,” in Proc. International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023, pp. 1–5.
  15. “Open-source Multi-speaker Corpora of the English Accents in the British Isles,” in Proc. LREC, 2020.
  16. “Open-Source High Quality Speech Datasets for Basque, Catalan and Galician,” in Proc. SLTU and CCURL, 2020.
  17. “A Step-by-Step Process for Building TTS Voices Using Open Source Data and Framework for Bangla, Javanese, Khmer, Nepali, Sinhala, and Sundanese,” in Proc. SLTU, 2018.
  18. “Crowdsourcing Latin American Spanish for Low-Resource Text-to-Speech,” in Proc. LREC, 2020.
  19. “Open-source Multi-speaker Speech Corpora for Building Gujarati, Kannada, Malayalam, Marathi, Tamil and Telugu Speech Synthesis Systems,” in Proc. LREC, 2020.
  20. “Burmese Speech Corpus, Finite-State Text Normalization and Pronunciation Grammars with an Application to Text-to-Speech,” in Proc. LREC, 2020.
  21. “Rapid development of TTS corpora for four South African languages,” in Proc. INTERSPEECH, 2017.
  22. “Developing an Open-Source Corpus of Yoruba Speech,” in Proc. INTERSPEECH, 2020.
  23. “Hi-Fi Multi-Speaker English TTS Dataset,” in Proc. INTERSPEECH, 2021, pp. 2776–2780.
  24. “UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation,” in Proc. INTERSPEECH, 2021.
  25. “Framewise WaveGAN: High speed adversarial vocoder in time domain with very low computational complexity,” in ICASSP 2023, 2023.
  26. “Least Squares Generative Adversarial Networks,” 10 2017, pp. 2813–2821.
  27. ITU-T, “Recommendation P.808: Subjective evaluation of speech quality with a crowdsourcing approach,” 2018.
  28. “SpeechBrain: A General-Purpose Speech Toolkit,” 2021, arXiv:2106.04624.
  29. “Librispeech: An ASR corpus based on public domain audio books,” in Proc. International Conference on Acoustics, Speech and Signal Processing (ICASSP), 04 2015, pp. 5206–5210.
Citations (3)

Summary

We haven't generated a summary for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Lightbulb On Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube