Community Detection Graph Convolutional Network for Overlap-Aware Speaker Diarization (2306.14530v1)
Abstract: The clustering algorithm plays a crucial role in speaker diarization systems. However, traditional clustering algorithms suffer from the complex distribution of speaker embeddings and lack of digging potential relationships between speakers in a session. We propose a novel graph-based clustering approach called Community Detection Graph Convolutional Network (CDGCN) to improve the performance of the speaker diarization system. The CDGCN-based clustering method consists of graph generation, sub-graph detection, and Graph-based Overlapped Speech Detection (Graph-OSD). Firstly, the graph generation refines the local linkages among speech segments. Secondly the sub-graph detection finds the optimal global partition of the speaker graph. Finally, we view speaker clustering for overlap-aware speaker diarization as an overlapped community detection task and design a Graph-OSD component to output overlap-aware labels. By capturing local and global information, the speaker diarization system with CDGCN clustering outperforms the traditional Clustering-based Speaker Diarization (CSD) systems on the DIHARD III corpus.
- “Front-end processing for the chime-5 dinner party scenario,” in CHiME5 Workshop, Hyderabad, India, 2018, vol. 1.
- “The stc system for the chime-6 challenge,” in CHiME 2020 Workshop on Speech Processing in Everyday Environments, 2020.
- “Unsupervised methods for speaker diarization: An integrated and iterative approach,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 21, no. 10, pp. 2015–2028, 2013.
- “Speaker diarization with plda i-vector scoring and unsupervised calibration,” in 2014 IEEE Spoken Language Technology Workshop (SLT). IEEE, 2014, pp. 413–417.
- “A spectral clustering approach to speaker diarization,” in Ninth International Conference on Spoken Language Processing, 2006.
- “Priors for speaker counting and diarization with ahc.,” in InterSpeech, 2016, pp. 2194–2198.
- “Microsoft speaker diarization system for the voxceleb speaker recognition challenge 2020,” in ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2021, pp. 5824–5828.
- “Spatial-aware speaker diarization for multi-channel multi-party meeting,” in Proc. Interspeech 2022, 2022, pp. 1491–1495.
- “Self-attentive similarity measurement strategies in speaker diarization.,” in INTERSPEECH, 2020, pp. 284–288.
- “LSTM Based Similarity Measurement with Spectral Clustering for Speaker Diarization,” in Proc. Interspeech 2019, 2019, pp. 366–370.
- “Speaker diarization with lstm,” in 2018 IEEE International conference on acoustics, speech and signal processing (ICASSP). IEEE, 2018, pp. 5239–5243.
- “Normalized cuts and image segmentation,” IEEE Transactions on pattern analysis and machine intelligence, vol. 22, no. 8, pp. 888–905, 2000.
- “Semi-supervised classification with graph convolutional networks,” in J. International Conference on Learning Representations (ICLR 2017), 2016.
- “Graph convolutional network based semi-supervised learning on multi-speaker meeting data,” in ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2022, pp. 6622–6626.
- “Speaker diarization with session-level speaker embedding refinement using graph neural networks,” in ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2020, pp. 7109–7113.
- “The third dihard diarization challenge,” arXiv preprint arXiv:2012.01477, 2020.
- “Linkage based face clustering via graph convolution network,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 1117–1125.
- “Reformulating speaker diarization as community detection with emphasis on topological structure,” in ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2022, pp. 8097–8101.
- “From louvain to leiden: guaranteeing well-connected communities.,” Scientific Reports, 2019.
- “Statistical mechanics of community detection,” Phys. Rev. E, vol. 74, pp. 016110, Jul 2006.
- “End-to-end speaker segmentation for overlap-aware resegmentation,” in Proc. Interspeech, 2021.
- “ASV-Subtools: Open source toolkit for automatic speaker verification,” in ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2021, pp. 6184–6188.
- “DOVER-Lap: A method for combining overlap-aware diarization outputs,” in 2021 IEEE Spoken Language Technology Workshop (SLT). IEEE, 2021, pp. 881–888.
- “But system description for the third dihard speech diarization challenge,” in Proc. 3rd DIHARD Speech Diarization Challenge Workshop, 2021.
- “The hitachi-jhu dihard iii system: Competitive end-to-end neural diarization and x-vector clustering systems combined by dover-lap,” arXiv preprint arXiv:2102.01363, 2021.
- “Auto-tuning spectral clustering for speaker diarization using normalized maximum eigengap,” IEEE Signal Processing Letters, vol. 27, pp. 381–385, 2019.
- Jie Wang (481 papers)
- Zhicong Chen (5 papers)
- Haodong Zhou (5 papers)
- Lin Li (330 papers)
- Qingyang Hong (29 papers)