Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Keyword spotting -- Detecting commands in speech using deep learning (2312.05640v1)

Published 9 Dec 2023 in cs.SD, cs.AI, cs.CL, cs.HC, and eess.AS

Abstract: Speech recognition has become an important task in the development of machine learning and artificial intelligence. In this study, we explore the important task of keyword spotting using speech recognition machine learning and deep learning techniques. We implement feature engineering by converting raw waveforms to Mel Frequency Cepstral Coefficients (MFCCs), which we use as inputs to our models. We experiment with several different algorithms such as Hidden Markov Model with Gaussian Mixture, Convolutional Neural Networks and variants of Recurrent Neural Networks including Long Short-Term Memory and the Attention mechanism. In our experiments, RNN with BiLSTM and Attention achieves the best performance with an accuracy of 93.9 %

Definition Search Book Streamline Icon: https://streamlinehq.com
References (10)
  1. Convolutional neural networks for speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(10):1533–1545, 2014.
  2. Deep speech 2: End-to-end speech recognition in english and mandarin, 2015.
  3. Very deep convolutional neural networks for raw waveforms. 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2017.
  4. S. Davis and P. Mermelstein. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, and Signal Processing, 28(4):357–366, 1980.
  5. A neural attention model for speech command recognition, 2018.
  6. Speech recognition with deep recurrent neural networks. In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pages 6645–6649, 2013.
  7. B. Juang and L. Rabiner. Hidden markov models for speech recognition. Technometrics, 33(3):251 – 272, 1991.
  8. D. . Jurafsky. Speech and language processing : an introduction to natural language processing, computational linguistics, and speech recognition. Pearson Prentice Hall, Upper Saddle River, N.J., 2009.
  9. M. McAteer. Getting started with Attention for Classification, 2018.
  10. P. Warden. Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition. ArXiv e-prints, Apr. 2018.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Sumedha Rai (1 paper)
  2. Tong Li (197 papers)
  3. Bella Lyu (1 paper)
Citations (2)

Summary

We haven't generated a summary for this paper yet.