Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 143 tok/s
Gemini 2.5 Pro 44 tok/s Pro
GPT-5 Medium 26 tok/s Pro
GPT-5 High 27 tok/s Pro
GPT-4o 106 tok/s Pro
Kimi K2 167 tok/s Pro
GPT OSS 120B 400 tok/s Pro
Claude Sonnet 4.5 35 tok/s Pro
2000 character limit reached

The THU-HCSI Multi-Speaker Multi-Lingual Few-Shot Voice Cloning System for LIMMITS'24 Challenge (2404.16619v1)

Published 25 Apr 2024 in cs.SD and eess.AS

Abstract: This paper presents the multi-speaker multi-lingual few-shot voice cloning system developed by THU-HCSI team for LIMMITS'24 Challenge. To achieve high speaker similarity and naturalness in both mono-lingual and cross-lingual scenarios, we build the system upon YourTTS and add several enhancements. For further improving speaker similarity and speech quality, we introduce speaker-aware text encoder and flow-based decoder with Transformer blocks. In addition, we denoise the few-shot data, mix up them with pre-training data, and adopt a speaker-balanced sampling strategy to guarantee effective fine-tuning for target speakers. The official evaluations in track 1 show that our system achieves the best speaker similarity MOS of 4.25 and obtains considerable naturalness MOS of 3.97.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (5)
  1. “LIMMITS’24: Multi-speaker, Multi-lingual Indic TTS with voice cloning,” submitted to ICASSP 2024, 2024, https://sites.google.com/view/limmits24/.
  2. “YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for Everyone,” in International Conference on Machine Learning. PMLR, 2022, pp. 2709–2720.
  3. “VITS2: Improving Quality and Efficiency of Single-Stage Text-to-Speech with Adversarial Learning and Architecture Design,” in Proc. INTERSPEECH 2023, 2023, pp. 4374–4378.
  4. “FullSubNet+: Channel Attention FullSubNet with Complex Spectrograms for Speech Enhancement,” in ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2022, pp. 7857–7861.
  5. “Residual Adapters for Few-Shot Text-to-Speech Speaker Adaptation,” arXiv preprint arXiv:2210.15868, 2022.

Summary

We haven't generated a summary for this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.