Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Disentangling Voice and Content with Self-Supervision for Speaker Recognition (2310.01128v3)

Published 2 Oct 2023 in eess.AS and cs.AI

Abstract: For speaker recognition, it is difficult to extract an accurate speaker representation from speech because of its mixture of speaker traits and content. This paper proposes a disentanglement framework that simultaneously models speaker traits and content variability in speech. It is realized with the use of three Gaussian inference layers, each consisting of a learnable transition model that extracts distinct speech components. Notably, a strengthened transition model is specifically designed to model complex speech dynamics. We also propose a self-supervision method to dynamically disentangle content without the use of labels other than speaker identities. The efficacy of the proposed framework is validated via experiments conducted on the VoxCeleb and SITW datasets with 9.56% and 8.24% average reductions in EER and minDCF, respectively. Since neither additional model training nor data is specifically needed, it is easily applicable in practical use.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Tianchi Liu (17 papers)
  2. Kong Aik Lee (77 papers)
  3. Qiongqiong Wang (20 papers)
  4. Haizhou Li (286 papers)
Citations (25)

Summary

We haven't generated a summary for this paper yet.