Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 134 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 24 tok/s Pro
GPT-5 High 26 tok/s Pro
GPT-4o 92 tok/s Pro
Kimi K2 193 tok/s Pro
GPT OSS 120B 439 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

Attention-Guided Adaptation for Code-Switching Speech Recognition (2312.08856v2)

Published 14 Dec 2023 in eess.AS and cs.SD

Abstract: The prevalence of the powerful multilingual models, such as Whisper, has significantly advanced the researches on speech recognition. However, these models often struggle with handling the code-switching setting, which is essential in multilingual speech recognition. Recent studies have attempted to address this setting by separating the modules for different languages to ensure distinct latent representations for languages. Some other methods considered the switching mechanism based on language identification. In this study, a new attention-guided adaptation is proposed to conduct parameter-efficient learning for bilingual ASR. This method selects those attention heads in a model which closely express language identities and then guided those heads to be correctly attended with their corresponding languages. The experiments on the Mandarin-English code-switching speech corpus show that the proposed approach achieves a 14.2% mixed error rate, surpassing state-of-the-art method, where only 5.6% additional parameters over Whisper are trained.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (26)
  1. “Introduction to the special section on deep learning for speech and language processing,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 20, no. 1, pp. 4–6, 2012.
  2. S. Poplack, Syntactic Structure and Social Function of Code-switching, Centro de Estudios Puertorriqueños, City University of New York, 1978.
  3. K. A. H. Zirker, Intrasentential vs. Intersentential Code Switching in Early and Late Bilinguals, Brigham Young University, 2007.
  4. “Unsupervised cross-lingual representation learning for speech recognition,” in Proc. of Annual Conference of International Speech Communication Association, 2021, pp. 2426–2430.
  5. “Robust speech recognition via large-scale weak supervision,” in Proc. of International Conference on Machine Learning, 2023, pp. 28492–28518.
  6. “Google USM: Scaling automatic speech recognition beyond 100 languages,” arXiv preprint arXiv:2303.01037, 2023.
  7. “Scaling speech technology to 1,000+ languages,” arXiv preprint arXiv:2305.13516, 2023.
  8. “Prompting large language models to generate code-mixed texts: The case of south east asian languages,” arXiv preprint arXiv:2303.13592, 2023.
  9. “Learning adapters for code-switching speech recognition,” in Proc. of Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2023, pp. 344–349.
  10. “Towards context-aware end-to-end code-switching speech recognition,” in Proc. of Annual Conference of International Speech Communication Association, 2020, pp. 4776–4780.
  11. “Bayesian transformer using disentangled mask attention,” in Proc. of Annual Conference of International Speech Communication Association, 2022, pp. 1761–1765.
  12. “Constrained output embeddings for end-to-end code-switching speech recognition with only monolingual data,” in Proc. of Annual Conference of International Speech Communication Association, 2019, pp. 2160–2164.
  13. “Online Compressive Transformer for End-to-End Speech Recognition,” in Proc. of Annual Conference of International Speech Communication Association, 2021, pp. 2082–2086.
  14. “Reducing multilingual context confusion for end-to-end code-switching automatic speech recognition,” in Proc. of Annual Conference of International Speech Communication Association, 2022, pp. 3894–3898.
  15. “Hierarchical and self-attended sequence autoencoder,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 9, pp. 4975–4986, 2022.
  16. “Learning continuous-time dynamics with attention,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 2, pp. 1906–1918, 2023.
  17. “Transformer-transducers for code-switched speech recognition,” in Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing, 2021, pp. 5859–5863.
  18. “Adversarial mask transformer for sequential learning,” in Prof. of International Conference on Acoustics, Speech and Signal Processing, 2022, pp. 4178–4182.
  19. “Variational disentangled attention and regularization for visual dialog,” in Proc. of International Joint Conference on Neural Networks, 2023, pp. 01–09.
  20. “Parameter-efficient learning for text-to-speech accent adaptation,” in Proc. of Annual Conference of International Speech Communication Association, 2023, pp. 4354–4358.
  21. “Prompting the hidden talent of web-scale speech models for zero-shot task generalization,” in Proc. of Annual Conference of International Speech Communication Association, 2023, pp. 396–400.
  22. “Fixed encoder self-attention patterns in transformer-based machine translation,” in Findings of the Association for Computational Linguistics, 2020, pp. 556–568.
  23. “Supportive and self attentions for image caption,” in Proc. of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2020, pp. 1713–1718.
  24. “Parameter-efficient transfer learning for NLP,” in Proc. of International Conference on Machine Learning, 2019, pp. 2790–2799.
  25. “ESPnet: End-to-end speech processing toolkit,” in Proc. of Annual Conference of International Speech Communication Association, 2018, pp. 2207–2211.
  26. “SEAME: A Mandarin-English code-switching speech corpus in south-east Asia,” in Proc. of Annual Conference of International Speech Communication Association, 2010, pp. 1986–1989.
Citations (7)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 1 tweet and received 4 likes.

Upgrade to Pro to view all of the tweets about this paper: