Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A novel pyramidal-FSMN architecture with lattice-free MMI for speech recognition (1810.11352v2)

Published 26 Oct 2018 in cs.SD and eess.AS

Abstract: Deep Feedforward Sequential Memory Network (DFSMN) has shown superior performance on speech recognition tasks. Based on this work, we propose a novel network architecture which introduces pyramidal memory structure to represent various context information in different layers. Additionally, res-CNN layers are added in the front to extract more sophisticated features as well. Together with lattice-free maximum mutual information (LF-MMI) and cross entropy (CE) joint training criteria, experimental results show that this approach achieves word error rates (WERs) of 3.62% and 10.89% respectively on Librispeech and LDC97S62 (Switchboard 300 hours) corpora. Furthermore, Recurrent neural network LLM (RNNLM) rescoring is applied and a WER of 2.97% is obtained on Librispeech.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Xuerui Yang (12 papers)
  2. Jiwei Li (137 papers)
  3. Xi Zhou (43 papers)
Citations (15)

Summary

We haven't generated a summary for this paper yet.