Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 134 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 27 tok/s Pro
GPT-5 High 27 tok/s Pro
GPT-4o 84 tok/s Pro
Kimi K2 174 tok/s Pro
GPT OSS 120B 430 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

L1-aware Multilingual Mispronunciation Detection Framework (2309.07719v2)

Published 14 Sep 2023 in cs.CL, cs.SD, and eess.AS

Abstract: The phonological discrepancies between a speaker's native (L1) and the non-native language (L2) serves as a major factor for mispronunciation. This paper introduces a novel multilingual MDD architecture, L1-MultiMDD, enriched with L1-aware speech representation. An end-to-end speech encoder is trained on the input signal and its corresponding reference phoneme sequence. First, an attention mechanism is deployed to align the input audio with the reference phoneme sequence. Afterwards, the L1-L2-speech embedding are extracted from an auxiliary model, pretrained in a multi-task setup identifying L1 and L2 language, and are infused with the primary network. Finally, the L1-MultiMDD is then optimized for a unified multilingual phoneme recognition task using connectionist temporal classification (CTC) loss for the target languages: English, Arabic, and Mandarin. Our experiments demonstrate the effectiveness of the proposed L1-MultiMDD framework on both seen -- L2-ARTIC, LATIC, and AraVoiceL2v2; and unseen -- EpaDB and Speechocean762 datasets. The consistent gains in PER, and false rejection rate (FRR) across all target languages confirm our approach's robustness, efficacy, and generalizability.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (22)
  1. “The effectiveness of computer assisted pronunciation training for foreign language learning by children,” Computer Assisted Language Learning, 2008.
  2. “Phone-level pronunciation scoring and assessment for interactive language learning,” Speech communication, 2000.
  3. “Context-aware goodness of pronunciation for computer-assisted pronunciation training,” arXiv preprint arXiv:2008.08647, 2020.
  4. “An improved goodness of pronunciation (GOP) measure for pronunciation evaluation with DNN-HMM system considering HMM transition probabilities.,” in INTERSPEECH, 2019.
  5. “Transformer-based multi-aspect multi-granularity non-native english speaker pronunciation assessment,” in ICASSP, 2022.
  6. “3M: An effective multi-view, multi-granularity, and multi-aspect modeling approach to english pronunciation assessment,” in APSIPA ASC, 2022.
  7. “CNN-RNN-CTC based end-to-end mispronunciation detection and diagnosis,” in ICASSP, 2019.
  8. “SED-MDD: Towards sentence dependent end-to-end mispronunciation detection and diagnosis,” in ICASSP, 2020.
  9. “Transformer based end-to-end mispronunciation detection and diagnosis.,” in Interspeech, 2021.
  10. “A full text-dependent end to end mispronunciation detection and diagnosis with easy data augmentation techniques,” arXiv preprint arXiv:2104.08428, 2021.
  11. “Explore Wav2vec 2.0 for mispronunciation detection.,” in Interspeech, 2021.
  12. “Multi-view multi-task representation learning for mispronunciation detection,” arXiv preprint arXiv:2306.01845, 2023.
  13. “Scaling speech technology to 1,000+ languages,” arXiv preprint arXiv:2305.13516, 2023.
  14. “Multilingual speech evaluation: English, Malay and Tamil,” arXiv preprint arXiv:2107.03675, 2021.
  15. “Multi-lingual pronunciation assessment with unified phoneme set and language-specific embeddings,” in ICASSP 2023, 2023.
  16. “L2-ARCTIC: A non-native English speech corpus.,” in Interspeech, 2018.
  17. XIAO ZHANG, “LATIC: A non-native pre-labelled mandarin chinese validation corpus for automatic speech scoring and evaluation task,” 2021.
  18. “Speechblender: Speech augmentation framework for mispronunciation data generation,” 2023.
  19. “EpaDB: A database for development of pronunciation assessment systems.,” in INTERSPEECH, 2019.
  20. “Unsupervised cross-lingual representation learning for speech recognition,” 2020.
  21. “Capturing L2 segmental mispronunciations with joint-sequence models in computer-aided pronunciation training (CAPT),” in nternational Symposium on Chinese Spoken Language Processing, 2010.
  22. “A Joint Model for Pronunciation Assessment and Mispronunciation Detection and Diagnosis with Multi-task Learning,” in INTERSPEECH, 2023.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube