Emergent Mind

The xx205 System for the VoxCeleb Speaker Recognition Challenge 2020

(2011.00200)
Published Oct 31, 2020 in cs.SD , cs.CL , and eess.AS

Abstract

This report describes the systems submitted to the first and second tracks of the VoxCeleb Speaker Recognition Challenge (VoxSRC) 2020, which ranked second in both tracks. Three key points of the system pipeline are explored: (1) investigating multiple CNN architectures including ResNet, Res2Net and dual path network (DPN) to extract the x-vectors, (2) using a composite angular margin softmax loss to train the speaker models, and (3) applying score normalization and system fusion to boost the performance. Measured on the VoxSRC-20 Eval set, the best submitted systems achieve an EER of $3.808\%$ and a MinDCF of $0.1958$ in the close-condition track 1, and an EER of $3.798\%$ and a MinDCF of $0.1942$ in the open-condition track 2, respectively.

We're not able to analyze this paper right now due to high demand.

Please check back later (sorry!).

Generate a summary of this paper on our Pro plan:

We ran into a problem analyzing this paper.

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.