Keyword spotting -- Detecting commands in speech using deep learning (2312.05640v1)
Abstract: Speech recognition has become an important task in the development of machine learning and artificial intelligence. In this study, we explore the important task of keyword spotting using speech recognition machine learning and deep learning techniques. We implement feature engineering by converting raw waveforms to Mel Frequency Cepstral Coefficients (MFCCs), which we use as inputs to our models. We experiment with several different algorithms such as Hidden Markov Model with Gaussian Mixture, Convolutional Neural Networks and variants of Recurrent Neural Networks including Long Short-Term Memory and the Attention mechanism. In our experiments, RNN with BiLSTM and Attention achieves the best performance with an accuracy of 93.9 %
- Convolutional neural networks for speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(10):1533–1545, 2014.
- Deep speech 2: End-to-end speech recognition in english and mandarin, 2015.
- Very deep convolutional neural networks for raw waveforms. 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2017.
- S. Davis and P. Mermelstein. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, and Signal Processing, 28(4):357–366, 1980.
- A neural attention model for speech command recognition, 2018.
- Speech recognition with deep recurrent neural networks. In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pages 6645–6649, 2013.
- B. Juang and L. Rabiner. Hidden markov models for speech recognition. Technometrics, 33(3):251 – 272, 1991.
- D. . Jurafsky. Speech and language processing : an introduction to natural language processing, computational linguistics, and speech recognition. Pearson Prentice Hall, Upper Saddle River, N.J., 2009.
- M. McAteer. Getting started with Attention for Classification, 2018.
- P. Warden. Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition. ArXiv e-prints, Apr. 2018.
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.