Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Multi-query multi-head attention pooling and Inter-topK penalty for speaker verification (2110.05042v2)

Published 11 Oct 2021 in cs.SD and eess.AS

Abstract: This paper describes the multi-query multi-head attention (MQMHA) pooling and inter-topK penalty methods which were first proposed in our submitted system description for VoxCeleb speaker recognition challenge (VoxSRC) 2021. Most multi-head attention pooling mechanisms either attend to the whole feature through multiple heads or attend to several split parts of the whole feature. Our proposed MQMHA combines both these two mechanisms and gain more diversified information. The margin-based softmax loss functions are commonly adopted to obtain discriminative speaker representations. To further enhance the inter-class discriminability, we propose a method that adds an extra inter-topK penalty on some confused speakers. By adopting both the MQMHA and inter-topK penalty, we achieved state-of-the-art performance in all of the public VoxCeleb test sets.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Miao Zhao (14 papers)
  2. Yufeng Ma (7 papers)
  3. Yiwei Ding (13 papers)
  4. Yu Zheng (198 papers)
  5. Min Liu (236 papers)
  6. Minqiang Xu (17 papers)
Citations (21)

Summary

We haven't generated a summary for this paper yet.