Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 172 tok/s
Gemini 2.5 Pro 48 tok/s Pro
GPT-5 Medium 30 tok/s Pro
GPT-5 High 29 tok/s Pro
GPT-4o 103 tok/s Pro
Kimi K2 199 tok/s Pro
GPT OSS 120B 464 tok/s Pro
Claude Sonnet 4.5 36 tok/s Pro
2000 character limit reached

Multilingual Zero Resource Speech Recognition Base on Self-Supervise Pre-Trained Acoustic Models (2210.06936v1)

Published 13 Oct 2022 in cs.CL, cs.SD, and eess.AS

Abstract: Labeled audio data is insufficient to build satisfying speech recognition systems for most of the languages in the world. There have been some zero-resource methods trying to perform phoneme or word-level speech recognition without labeled audio data of the target language, but the error rate of these methods is usually too high to be applied in real-world scenarios. Recently, the representation ability of self-supervise pre-trained models has been found to be extremely beneficial in zero-resource phoneme recognition. As far as we are concerned, this paper is the first attempt to extend the use of pre-trained models into word-level zero-resource speech recognition. This is done by fine-tuning the pre-trained models on IPA phoneme transcriptions and decoding with a LLM trained on extra texts. Experiments on Wav2vec 2.0 and HuBERT models show that this method can achieve less than 20% word error rate on some languages, and the average error rate on 8 languages is 33.77%.

Summary

We haven't generated a summary for this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.