Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 149 tok/s
Gemini 2.5 Pro 46 tok/s Pro
GPT-5 Medium 25 tok/s Pro
GPT-5 High 30 tok/s Pro
GPT-4o 112 tok/s Pro
Kimi K2 205 tok/s Pro
GPT OSS 120B 434 tok/s Pro
Claude Sonnet 4.5 38 tok/s Pro
2000 character limit reached

Multi-Stage Face-Voice Association Learning with Keynote Speaker Diarization (2407.17902v1)

Published 25 Jul 2024 in eess.AS

Abstract: The human brain has the capability to associate the unknown person's voice and face by leveraging their general relationship, referred to as cross-modal speaker verification''. This task poses significant challenges due to the complex relationship between the modalities. In this paper, we propose aMulti-stage Face-voice Association Learning with Keynote Speaker Diarization''~(MFV-KSD) framework. MFV-KSD contains a keynote speaker diarization front-end to effectively address the noisy speech inputs issue. To balance and enhance the intra-modal feature learning and inter-modal correlation understanding, MFV-KSD utilizes a novel three-stage training strategy. Our experimental results demonstrated robust performance, achieving the first rank in the 2024 Face-voice Association in Multilingual Environments (FAME) challenge with an overall Equal Error Rate (EER) of 19.9%. Details can be found in https://github.com/TaoRuijie/MFV-KSD.

Citations (1)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.