Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 48 tok/s
Gemini 2.5 Pro 48 tok/s Pro
GPT-5 Medium 26 tok/s Pro
GPT-5 High 19 tok/s Pro
GPT-4o 107 tok/s Pro
Kimi K2 205 tok/s Pro
GPT OSS 120B 473 tok/s Pro
Claude Sonnet 4 37 tok/s Pro
2000 character limit reached

Face-voice Association in Multilingual Environments (FAME) Challenge 2024 Evaluation Plan (2404.09342v3)

Published 14 Apr 2024 in cs.CV, cs.SD, and eess.AS

Abstract: The advancements of technology have led to the use of multimodal systems in various real-world applications. Among them, the audio-visual systems are one of the widely used multimodal systems. In the recent years, associating face and voice of a person has gained attention due to presence of unique correlation between them. The Face-voice Association in Multilingual Environments (FAME) Challenge 2024 focuses on exploring face-voice association under a unique condition of multilingual scenario. This condition is inspired from the fact that half of the world's population is bilingual and most often people communicate under multilingual scenario. The challenge uses a dataset namely, Multilingual Audio-Visual (MAV-Celeb) for exploring face-voice association in multilingual environments. This report provides the details of the challenge, dataset, baselines and task details for the FAME Challenge.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (15)
  1. “An introduction to biometric recognition,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 14, no. 1, pp. 4–20, 2004.
  2. “Multimodal biometrics: An overview,” in 12th European Signal Processing Conference 2004, 2004, pp. 1221–1224.
  3. “Speaker recognition in realistic scenario using multimodal data,” arXiv preprint arXiv:2302.13033, 2023.
  4. “‘putting the face to the voice’: Matching identity across modality,” Current Biology, vol. 13, no. 19, pp. 1709–1714, 2003.
  5. “Seeing voices and hearing faces: Cross-modal biometric matching,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 8427–8436.
  6. “Face-voice matching using cross-modal embeddings,” in ACM Multimedia Conference on Multimedia Conference 2018, Susanne Boll, Kyoung Mu Lee, Jiebo Luo, Wenwu Zhu, Hyeran Byun, Chang Wen Chen, Rainer Lienhart, and Tao Mei, Eds. 2018, pp. 1011–1019, ACM.
  7. “Audio-Visual Speaker Recognition with a Cross-Modal Discriminative Network,” in Proc. Interspeech 2020, 2020, pp. 2242–2246.
  8. “Fusion and orthogonal projection for improved face-voice association,” in ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2022, pp. 7057–7061.
  9. “Deep latent space learning for cross-modal mapping of audio and visual signals,” in 2019 Digital Image Computing: Techniques and Applications (DICTA). IEEE, 2019, pp. 1–7.
  10. “Single-branch network for multimodal training,” in ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2023, pp. 1–5.
  11. Jay Mathews, “Half of the world is bilingual. What’s our problem?,” www.washingtonpost.com/local/education/half-the-world-is-bilingual-whats-our-problem/2019/04/24/1c2b0cc2-6625-11e9-a1b6-b29b90efa879_story, 2019, [Online; accessed 16-April-2021].
  12. “Cross-modal speaker verification and recognition: A multilingual perspective,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 1682–1691.
  13. “Learnable pins: Cross-modal embeddings for person identity,” in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 71–88.
  14. “Deep face recognition,” 2015.
  15. “Utterance-level aggregation for speaker recognition in the wild,” in ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2019, pp. 5791–5795.
Citations (3)

Summary

We haven't generated a summary for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Lightbulb On Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.