Single-Codec: Single-Codebook Speech Codec towards High-Performance Speech Generation (2406.07422v1)

Published 11 Jun 2024 in eess.AS

Abstract: The multi-codebook speech codec enables the application of LLMs (LLM) in TTS but bottlenecks efficiency and robustness due to multi-sequence prediction. To avoid this obstacle, we propose Single-Codec, a single-codebook single-sequence codec, which employs a disentangled VQ-VAE to decouple speech into a time-invariant embedding and a phonetically-rich discrete sequence. Furthermore, the encoder is enhanced with 1) contextual modeling with a BLSTM module to exploit the temporal information, 2) a hybrid sampling module to alleviate distortion from upsampling and downsampling, and 3) a resampling module to encourage discrete units to carry more phonetic information. Compared with multi-codebook codecs, e.g., EnCodec and TiCodec, Single-Codec demonstrates higher reconstruction quality with a lower bandwidth of only 304bps. The effectiveness of Single-Code is further validated by LLM-TTS experiments, showing improved naturalness and intelligibility.

Authors (9)

Hanzhao Li (10 papers)
Liumeng Xue (24 papers)
Haohan Guo (22 papers)
Xinfa Zhu (29 papers)
Yuanjun Lv (12 papers)
Lei Xie (337 papers)
Yunlin Chen (7 papers)
Hao Yin (66 papers)
Zhifei Li (19 papers)

Citations (17)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Single-Codec: Single-Codebook Speech Codec towards High-Performance Speech Generation (2406.07422v1)

Summary

Related Papers