Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Neural Speaker Diarization with Speaker-Wise Chain Rule (2006.01796v1)

Published 2 Jun 2020 in eess.AS, cs.CL, and cs.SD

Abstract: Speaker diarization is an essential step for processing multi-speaker audio. Although an end-to-end neural diarization (EEND) method achieved state-of-the-art performance, it is limited to a fixed number of speakers. In this paper, we solve this fixed number of speaker issue by a novel speaker-wise conditional inference method based on the probabilistic chain rule. In the proposed method, each speaker's speech activity is regarded as a single random variable, and is estimated sequentially conditioned on previously estimated other speakers' speech activities. Similar to other sequence-to-sequence models, the proposed method produces a variable number of speakers with a stop sequence condition. We evaluated the proposed method on multi-speaker audio recordings of a variable number of speakers. Experimental results show that the proposed method can correctly produce diarization results with a variable number of speakers and outperforms the state-of-the-art end-to-end speaker diarization methods in terms of diarization error rate.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Yusuke Fujita (37 papers)
  2. Shinji Watanabe (416 papers)
  3. Shota Horiguchi (45 papers)
  4. Yawen Xue (10 papers)
  5. Jing Shi (123 papers)
  6. Kenji Nagamatsu (19 papers)
Citations (43)

Summary

We haven't generated a summary for this paper yet.