Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Community Detection Graph Convolutional Network for Overlap-Aware Speaker Diarization (2306.14530v1)

Published 26 Jun 2023 in eess.AS

Abstract: The clustering algorithm plays a crucial role in speaker diarization systems. However, traditional clustering algorithms suffer from the complex distribution of speaker embeddings and lack of digging potential relationships between speakers in a session. We propose a novel graph-based clustering approach called Community Detection Graph Convolutional Network (CDGCN) to improve the performance of the speaker diarization system. The CDGCN-based clustering method consists of graph generation, sub-graph detection, and Graph-based Overlapped Speech Detection (Graph-OSD). Firstly, the graph generation refines the local linkages among speech segments. Secondly the sub-graph detection finds the optimal global partition of the speaker graph. Finally, we view speaker clustering for overlap-aware speaker diarization as an overlapped community detection task and design a Graph-OSD component to output overlap-aware labels. By capturing local and global information, the speaker diarization system with CDGCN clustering outperforms the traditional Clustering-based Speaker Diarization (CSD) systems on the DIHARD III corpus.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (26)
  1. “Front-end processing for the chime-5 dinner party scenario,” in CHiME5 Workshop, Hyderabad, India, 2018, vol. 1.
  2. “The stc system for the chime-6 challenge,” in CHiME 2020 Workshop on Speech Processing in Everyday Environments, 2020.
  3. “Unsupervised methods for speaker diarization: An integrated and iterative approach,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 21, no. 10, pp. 2015–2028, 2013.
  4. “Speaker diarization with plda i-vector scoring and unsupervised calibration,” in 2014 IEEE Spoken Language Technology Workshop (SLT). IEEE, 2014, pp. 413–417.
  5. “A spectral clustering approach to speaker diarization,” in Ninth International Conference on Spoken Language Processing, 2006.
  6. “Priors for speaker counting and diarization with ahc.,” in InterSpeech, 2016, pp. 2194–2198.
  7. “Microsoft speaker diarization system for the voxceleb speaker recognition challenge 2020,” in ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2021, pp. 5824–5828.
  8. “Spatial-aware speaker diarization for multi-channel multi-party meeting,” in Proc. Interspeech 2022, 2022, pp. 1491–1495.
  9. “Self-attentive similarity measurement strategies in speaker diarization.,” in INTERSPEECH, 2020, pp. 284–288.
  10. “LSTM Based Similarity Measurement with Spectral Clustering for Speaker Diarization,” in Proc. Interspeech 2019, 2019, pp. 366–370.
  11. “Speaker diarization with lstm,” in 2018 IEEE International conference on acoustics, speech and signal processing (ICASSP). IEEE, 2018, pp. 5239–5243.
  12. “Normalized cuts and image segmentation,” IEEE Transactions on pattern analysis and machine intelligence, vol. 22, no. 8, pp. 888–905, 2000.
  13. “Semi-supervised classification with graph convolutional networks,” in J. International Conference on Learning Representations (ICLR 2017), 2016.
  14. “Graph convolutional network based semi-supervised learning on multi-speaker meeting data,” in ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2022, pp. 6622–6626.
  15. “Speaker diarization with session-level speaker embedding refinement using graph neural networks,” in ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2020, pp. 7109–7113.
  16. “The third dihard diarization challenge,” arXiv preprint arXiv:2012.01477, 2020.
  17. “Linkage based face clustering via graph convolution network,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 1117–1125.
  18. “Reformulating speaker diarization as community detection with emphasis on topological structure,” in ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2022, pp. 8097–8101.
  19. “From louvain to leiden: guaranteeing well-connected communities.,” Scientific Reports, 2019.
  20. “Statistical mechanics of community detection,” Phys. Rev. E, vol. 74, pp. 016110, Jul 2006.
  21. “End-to-end speaker segmentation for overlap-aware resegmentation,” in Proc. Interspeech, 2021.
  22. “ASV-Subtools: Open source toolkit for automatic speaker verification,” in ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2021, pp. 6184–6188.
  23. “DOVER-Lap: A method for combining overlap-aware diarization outputs,” in 2021 IEEE Spoken Language Technology Workshop (SLT). IEEE, 2021, pp. 881–888.
  24. “But system description for the third dihard speech diarization challenge,” in Proc. 3rd DIHARD Speech Diarization Challenge Workshop, 2021.
  25. “The hitachi-jhu dihard iii system: Competitive end-to-end neural diarization and x-vector clustering systems combined by dover-lap,” arXiv preprint arXiv:2102.01363, 2021.
  26. “Auto-tuning spectral clustering for speaker diarization using normalized maximum eigengap,” IEEE Signal Processing Letters, vol. 27, pp. 381–385, 2019.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Jie Wang (481 papers)
  2. Zhicong Chen (5 papers)
  3. Haodong Zhou (5 papers)
  4. Lin Li (330 papers)
  5. Qingyang Hong (29 papers)
Citations (4)

Summary

We haven't generated a summary for this paper yet.