Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Fast Context-Biasing for CTC and Transducer ASR models with CTC-based Word Spotter (2406.07096v1)

Published 11 Jun 2024 in eess.AS, cs.AI, cs.CL, cs.LG, and cs.SD

Abstract: Accurate recognition of rare and new words remains a pressing problem for contextualized Automatic Speech Recognition (ASR) systems. Most context-biasing methods involve modification of the ASR model or the beam-search decoding algorithm, complicating model reuse and slowing down inference. This work presents a new approach to fast context-biasing with CTC-based Word Spotter (CTC-WS) for CTC and Transducer (RNN-T) ASR models. The proposed method matches CTC log-probabilities against a compact context graph to detect potential context-biasing candidates. The valid candidates then replace their greedy recognition counterparts in corresponding frame intervals. A Hybrid Transducer-CTC model enables the CTC-WS application for the Transducer model. The results demonstrate a significant acceleration of the context-biasing recognition with a simultaneous improvement in F-score and WER compared to baseline methods. The proposed method is publicly available in the NVIDIA NeMo toolkit.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Andrei Andrusenko (12 papers)
  2. Aleksandr Laptev (14 papers)
  3. Vladimir Bataev (14 papers)
  4. Vitaly Lavrukhin (32 papers)
  5. Boris Ginsburg (111 papers)

Summary

We haven't generated a summary for this paper yet.