Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Multi-head Monotonic Chunkwise Attention For Online Speech Recognition (2005.00205v1)

Published 1 May 2020 in cs.CL, cs.SD, and eess.AS

Abstract: The attention mechanism of the Listen, Attend and Spell (LAS) model requires the whole input sequence to calculate the attention context and thus is not suitable for online speech recognition. To deal with this problem, we propose multi-head monotonic chunk-wise attention (MTH-MoChA), an improved version of MoChA. MTH-MoChA splits the input sequence into small chunks and computes multi-head attentions over the chunks. We also explore useful training strategies such as LSTM pooling, minimum world error rate training and SpecAugment to further improve the performance of MTH-MoChA. Experiments on AISHELL-1 data show that the proposed model, along with the training strategies, improve the character error rate (CER) of MoChA from 8.96% to 7.68% on test set. On another 18000 hours in-car speech data set, MTH-MoChA obtains 7.28% CER, which is significantly better than a state-of-the-art hybrid system.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Baiji Liu (3 papers)
  2. Songjun Cao (15 papers)
  3. Sining Sun (17 papers)
  4. Weibin Zhang (23 papers)
  5. Long Ma (116 papers)
Citations (9)

Summary

We haven't generated a summary for this paper yet.