2000 character limit reached
Spot keywords from very noisy and mixed speech (2305.17706v1)
Published 28 May 2023 in cs.SD, cs.AI, and eess.AS
Abstract: Most existing keyword spotting research focuses on conditions with slight or moderate noise. In this paper, we try to tackle a more challenging task: detecting keywords buried under strong interfering speech (10 times higher than the keyword in amplitude), and even worse, mixed with other keywords. We propose a novel Mix Training (MT) strategy that encourages the model to discover low-energy keywords from noisy and mixed speech. Experiments were conducted with a vanilla CNN and two EfficientNet (B0/B2) architectures. The results evaluated with the Google Speech Command dataset demonstrated that the proposed mix training approach is highly effective and outperforms standard data augmentation and mixup training.
- Ying Shi (33 papers)
- Dong Wang (628 papers)
- Lantian Li (74 papers)
- Jiqing Han (26 papers)
- Shi Yin (28 papers)