Papers

Topics

Authors

Recent

View all

Assistant

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 67 tok/s

Gemini 2.5 Pro 56 tok/s Pro

GPT-5 Medium 25 tok/s Pro

GPT-5 High 33 tok/s Pro

GPT-4o 96 tok/s Pro

Kimi K2 209 tok/s Pro

GPT OSS 120B 429 tok/s Pro

Claude Sonnet 4.5 35 tok/s Pro

2000 character limit reached

Interpretable Filter Learning Using Soft Self-attention For Raw Waveform Speech Recognition (2001.07067v1)

Published 20 Jan 2020 in eess.AS

Abstract: Speech recognition from raw waveform involves learning the spectral decomposition of the signal in the first layer of the neural acoustic model using a convolution layer. In this work, we propose a raw waveform convolutional filter learning approach using soft self-attention. The acoustic filter bank in the proposed model is implemented using a parametric cosine-modulated Gaussian filter bank whose parameters are learned. A network-in-network architecture provides self-attention to generate attention weights over the sub-band filters. The attention weighted log filter bank energies are fed to the acoustic model for the task of speech recognition. Experiments are conducted on Aurora-4 (additive noise with channel artifact), and CHiME-3 (additive noise with reverberation) databases. In these experiments, the attention based filter learning approach provides considerable improvements in ASR performance over the baseline mel filter-bank features and other robust front-ends (average relative improvement of 7% in word error rate over baseline features on Aurora-4 dataset, and 5% on CHiME-3 database). Using the self-attention weights, we also present an analysis on the interpretability of the filters for the ASR task.

Citations (1)

View on Semantic Scholar