Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 126 tok/s
Gemini 2.5 Pro 45 tok/s Pro
GPT-5 Medium 29 tok/s Pro
GPT-5 High 32 tok/s Pro
GPT-4o 127 tok/s Pro
Kimi K2 183 tok/s Pro
GPT OSS 120B 425 tok/s Pro
Claude Sonnet 4.5 38 tok/s Pro
2000 character limit reached

Robust One-Shot Singing Voice Conversion (2210.11096v2)

Published 20 Oct 2022 in cs.SD, cs.LG, and eess.AS

Abstract: Recent progress in deep generative models has improved the quality of voice conversion in the speech domain. However, high-quality singing voice conversion (SVC) of unseen singers remains challenging due to the wider variety of musical expressions in pitch, loudness, and pronunciation. Moreover, singing voices are often recorded with reverb and accompaniment music, which make SVC even more challenging. In this work, we present a robust one-shot SVC (ROSVC) that performs any-to-any SVC robustly even on such distorted singing voices. To this end, we first propose a one-shot SVC model based on generative adversarial networks that generalizes to unseen singers via partial domain conditioning and learns to accurately recover the target pitch via pitch distribution matching and AdaIN-skip conditioning. We then propose a two-stage training method called Robustify that train the one-shot SVC model in the first stage on clean data to ensure high-quality conversion, and introduces enhancement modules to the encoders of the model in the second stage to enhance the feature extraction from distorted singing voices. To further improve the voice quality and pitch reconstruction accuracy, we finally propose a hierarchical diffusion model for singing voice neural vocoders. Experimental results show that the proposed method outperforms state-of-the-art one-shot SVC baselines for both seen and unseen singers and significantly improves the robustness against distortions.

Citations (7)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.