Speech-Aware Neural Diarization with Encoder-Decoder Attractor Guided by Attention Constraints (2403.14268v1)

Published 21 Mar 2024 in eess.AS and cs.SD

Abstract: End-to-End Neural Diarization with Encoder-Decoder based Attractor (EEND-EDA) is an end-to-end neural model for automatic speaker segmentation and labeling. It achieves the capability to handle flexible number of speakers by estimating the number of attractors. EEND-EDA, however, struggles to accurately capture local speaker dynamics. This work proposes an auxiliary loss that aims to guide the Transformer encoders at the lower layer of EEND-EDA model to enhance the effect of self-attention modules using speaker activity information. The results evaluated on public dataset Mini LibriSpeech, demonstrates the effectiveness of the work, reducing Diarization Error Rate from 30.95% to 28.17%. We will release the source code on GitHub to allow further research and reproducibility.

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Tweets

https://twitter.com/ArxivSound/status/1771025064202281459

https://twitter.com/AudioAndSpeech/status/1771062302990643342

Speech-Aware Neural Diarization with Encoder-Decoder Attractor Guided by Attention Constraints (2403.14268v1)

Summary

Related Papers

Tweets