Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Enhancing Low-Quality Voice Recordings Using Disentangled Channel Factor and Neural Waveform Model (2011.05038v1)

Published 10 Nov 2020 in eess.AS and cs.SD

Abstract: High-quality speech corpora are essential foundations for most speech applications. However, such speech data are expensive and limited since they are collected in professional recording environments. In this work, we propose an encoder-decoder neural network to automatically enhance low-quality recordings to professional high-quality recordings. To address channel variability, we first filter out the channel characteristics from the original input audio using the encoder network with adversarial training. Next, we disentangle the channel factor from a reference audio. Conditioned on this factor, an auto-regressive decoder is then used to predict the target-environment Mel spectrogram. Finally, we apply a neural vocoder to synthesize the speech waveform. Experimental results show that the proposed system can generate a professional high-quality speech waveform when setting high-quality audio as the reference. It also improves speech enhancement performance compared with several state-of-the-art baseline systems.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Haoyu Li (56 papers)
  2. Yang Ai (41 papers)
  3. Junichi Yamagishi (178 papers)
Citations (2)

Summary

We haven't generated a summary for this paper yet.