Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 60 tok/s
Gemini 2.5 Pro 51 tok/s Pro
GPT-5 Medium 18 tok/s Pro
GPT-5 High 14 tok/s Pro
GPT-4o 77 tok/s Pro
Kimi K2 159 tok/s Pro
GPT OSS 120B 456 tok/s Pro
Claude Sonnet 4 38 tok/s Pro
2000 character limit reached

The VoiceMOS Challenge 2023: Zero-shot Subjective Speech Quality Prediction for Multiple Domains (2310.02640v3)

Published 4 Oct 2023 in eess.AS

Abstract: We present the second edition of the VoiceMOS Challenge, a scientific event that aims to promote the study of automatic prediction of the mean opinion score (MOS) of synthesized and processed speech. This year, we emphasize real-world and challenging zero-shot out-of-domain MOS prediction with three tracks for three different voice evaluation scenarios. Ten teams from industry and academia in seven different countries participated. Surprisingly, we found that the two sub-tracks of French text-to-speech synthesis had large differences in their predictability, and that singing voice-converted samples were not as difficult to predict as we had expected. Use of diverse datasets and listener information during training appeared to be successful approaches.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (20)
  1. “The VoiceMOS Challenge 2022,” in Proc. Interspeech, 2022, pp. 4536–4540.
  2. “How do voices from past speech synthesis challenges compare today?,” in Proc. 11th ISCA Speech Synthesis Workshop (SSW 11), 2021, pp. 183–188.
  3. “NISQA: A Deep CNN-Self-Attention Model for Multidimensional Speech Quality Prediction with Crowdsourced Datasets,” in Proc. Interspeech, 2021, pp. 2127–2131.
  4. “ConferencingSpeech 2022 Challenge: Non-intrusive Objective Speech Quality Assessment (NISQA) Challenge for Online Conferencing Applications,” in Proc. Interspeech, 2022, pp. 3308–3312.
  5. “The singing voice conversion challenge 2023,” arXiv preprint arXiv:2306.14422, 2023.
  6. “Deep learning-based non-intrusive multi-objective speech assessment model with cross-domain features,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 31, pp. 54–70, 2022.
  7. A Black and Keiichi Tokuda, “The Blizzard Challenge 2005: Evaluating corpus-based speech synthesis on common databases,” in Proc. Interspeech, 2005, pp. 77–80.
  8. “The Blizzard Challenge 2023,” in Proc. 18th Blizzard Challenge Workshop, Grenoble, France, August 29 2023, https://www.synsig.org/index.php/Blizzard_Challenge_2023.
  9. “Natural TTS synthesis by conditioning WaveNet on mel spectrogram predictions,” in Proc. ICASSP, 2018.
  10. “Fastspeech 2: Fast and high-quality end-to-end text to speech,” in Proc. International Conference on Learning Representations, 2021.
  11. “The Voice Conversion Challenge 2016,” in Proc. Interspeech, 2016, pp. 1632–1636.
  12. “The Voice Conversion Challenge 2018: Promoting development of parallel and nonparallel methods,” in Proc. Odyssey The Speaker and Language Recognition Workshop, 2018, pp. 195–202.
  13. “Voice Conversion Challenge 2020 - Intra-lingual semi-parallel and cross-lingual voice conversion -,” in Proc. Joint Workshop for the BC and VCC 2020, 2020, pp. 80–98.
  14. “A study on incorporating Whisper for robust speech assessment,” 2023.
  15. “SOMOS: The Samsung Open MOS Dataset for the Evaluation of Neural Text-to-Speech Synthesis,” in Proc. Interspeech 2022, 2022, pp. 2388–2392.
  16. “Ressources for End-to-End French Text-to-Speech Blizzard challenge,” Jan. 2023, https://doi.org/10.5281/zenodo.7560290.
  17. “Generalization ability of MOS prediction networks,” in ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2022, pp. 8442–8446.
  18. “UTMOS: UTokyo-SaruLab System for VoiceMOS Challenge 2022,” in Proc. Interspeech 2022, 2022, pp. 4521–4525.
  19. “LDNet: unified listener dependent modeling in MOS prediction for synthetic speech,” in ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2022, pp. 896–900.
  20. “SpeechLMScore: evaluating speech generation using speech language model,” in ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023, pp. 1–5.
Citations (17)

Summary

We haven't generated a summary for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Lightbulb On Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.