Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

ECAPA-TDNN Embeddings for Speaker Diarization (2104.01466v1)

Published 3 Apr 2021 in eess.AS

Abstract: Learning robust speaker embeddings is a crucial step in speaker diarization. Deep neural networks can accurately capture speaker discriminative characteristics and popular deep embeddings such as x-vectors are nowadays a fundamental component of modern diarization systems. Recently, some improvements over the standard TDNN architecture used for x-vectors have been proposed. The ECAPA-TDNN model, for instance, has shown impressive performance in the speaker verification domain, thanks to a carefully designed neural model. In this work, we extend, for the first time, the use of the ECAPA-TDNN model to speaker diarization. Moreover, we improved its robustness with a powerful augmentation scheme that concatenates several contaminated versions of the same signal within the same training batch. The ECAPA-TDNN model turned out to provide robust speaker embeddings under both close-talking and distant-talking conditions. Our results on the popular AMI meeting corpus show that our system significantly outperforms recently proposed approaches.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Nauman Dawalatabad (10 papers)
  2. Mirco Ravanelli (72 papers)
  3. François Grondin (32 papers)
  4. Jenthe Thienpondt (13 papers)
  5. Brecht Desplanques (10 papers)
  6. Hwidong Na (4 papers)
Citations (84)

Summary

We haven't generated a summary for this paper yet.