Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Better and Faster End-to-End Model for Streaming ASR (2011.10798v2)

Published 21 Nov 2020 in eess.AS and cs.SD

Abstract: End-to-end (E2E) models have shown to outperform state-of-the-art conventional models for streaming speech recognition [1] across many dimensions, including quality (as measured by word error rate (WER)) and endpointer latency [2]. However, the model still tends to delay the predictions towards the end and thus has much higher partial latency compared to a conventional ASR model. To address this issue, we look at encouraging the E2E model to emit words early, through an algorithm called FastEmit [3]. Naturally, improving on latency results in a quality degradation. To address this, we explore replacing the LSTM layers in the encoder of our E2E model with Conformer layers [4], which has shown good improvements for ASR. Secondly, we also explore running a 2nd-pass beam search to improve quality. In order to ensure the 2nd-pass completes quickly, we explore non-causal Conformer layers that feed into the same 1st-pass RNN-T decoder, an algorithm called Cascaded Encoders [5]. Overall, we find that the Conformer RNN-T with Cascaded Encoders offers a better quality and latency tradeoff for streaming ASR.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (15)
  1. Bo Li (1107 papers)
  2. Anmol Gulati (13 papers)
  3. Jiahui Yu (65 papers)
  4. Tara N. Sainath (79 papers)
  5. Chung-Cheng Chiu (48 papers)
  6. Arun Narayanan (34 papers)
  7. Ruoming Pang (59 papers)
  8. Yanzhang He (41 papers)
  9. James Qin (20 papers)
  10. Wei Han (202 papers)
  11. Qiao Liang (26 papers)
  12. Yu Zhang (1400 papers)
  13. Trevor Strohman (38 papers)
  14. Yonghui Wu (115 papers)
  15. Shuo-yiin Chang (25 papers)
Citations (122)

Summary

We haven't generated a summary for this paper yet.