Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Data Efficient Voice Cloning from Noisy Samples with Domain Adversarial Training (2008.04265v2)

Published 10 Aug 2020 in eess.AS and cs.SD

Abstract: Data efficient voice cloning aims at synthesizing target speaker's voice with only a few enroLLMent samples at hand. To this end, speaker adaptation and speaker encoding are two typical methods based on base model trained from multiple speakers. The former uses a small set of target speaker data to transfer the multi-speaker model to target speaker's voice through direct model update, while in the latter, only a few seconds of target speaker's audio directly goes through an extra speaker encoding model along with the multi-speaker model to synthesize target speaker's voice without model update. Nevertheless, the two methods need clean target speaker data. However, the samples provided by user may inevitably contain acoustic noise in real applications. It's still challenging to generating target voice with noisy data. In this paper, we study the data efficient voice cloning problem from noisy samples under the sequence-to-sequence based TTS paradigm. Specifically, we introduce domain adversarial training (DAT) to speaker adaptation and speaker encoding, which aims to disentangle noise from speech-noise mixture. Experiments show that for both speaker adaptation and encoding, the proposed approaches can consistently synthesize clean speech from noisy speaker samples, apparently outperforming the method adopting state-of-the-art speech enhancement module.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Jian Cong (16 papers)
  2. Shan Yang (58 papers)
  3. Lei Xie (337 papers)
  4. Guoqiao Yu (3 papers)
  5. Guanglu Wan (24 papers)
Citations (27)

Summary

We haven't generated a summary for this paper yet.