Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MParrotTTS: Multilingual Multi-speaker Text to Speech Synthesis in Low Resource Setting (2305.11926v1)

Published 19 May 2023 in cs.SD, cs.CL, cs.LG, and eess.AS

Abstract: We present MParrotTTS, a unified multilingual, multi-speaker text-to-speech (TTS) synthesis model that can produce high-quality speech. Benefiting from a modularized training paradigm exploiting self-supervised speech representations, MParrotTTS adapts to a new language with minimal supervised data and generalizes to languages not seen while training the self-supervised backbone. Moreover, without training on any bilingual or parallel examples, MParrotTTS can transfer voices across languages while preserving the speaker-specific characteristics, e.g., synthesizing fluent Hindi speech using a French speaker's voice and accent. We present extensive results on six languages in terms of speech naturalness and speaker similarity in parallel and cross-lingual synthesis. The proposed model outperforms the state-of-the-art multilingual TTS models and baselines, using only a small fraction of supervised training data. Speech samples from our model can be found at https://paper2438.github.io/tts/

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Neil Shah (87 papers)
  2. Vishal Tambrahalli (3 papers)
  3. Saiteja Kosgi (4 papers)
  4. Niranjan Pedanekar (6 papers)
  5. Vineet Gandhi (41 papers)

Summary

We haven't generated a summary for this paper yet.