Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

LDNet: Unified Listener Dependent Modeling in MOS Prediction for Synthetic Speech (2110.09103v1)

Published 18 Oct 2021 in cs.SD, cs.CL, and eess.AS

Abstract: An effective approach to automatically predict the subjective rating for synthetic speech is to train on a listening test dataset with human-annotated scores. Although each speech sample in the dataset is rated by several listeners, most previous works only used the mean score as the training target. In this work, we present LDNet, a unified framework for mean opinion score (MOS) prediction that predicts the listener-wise perceived quality given the input speech and the listener identity. We reflect recent advances in LD modeling, including design choices of the model architecture, and propose two inference methods that provide more stable results and efficient computation. We conduct systematic experiments on the voice conversion challenge (VCC) 2018 benchmark and a newly collected large-scale MOS dataset, providing an in-depth analysis of the proposed framework. Results show that the mean listener inference method is a better way to utilize the mean scores, whose effectiveness is more obvious when having more ratings per sample.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Wen-Chin Huang (53 papers)
  2. Erica Cooper (46 papers)
  3. Junichi Yamagishi (178 papers)
  4. Tomoki Toda (106 papers)
Citations (67)

Summary

We haven't generated a summary for this paper yet.