A novel pyramidal-FSMN architecture with lattice-free MMI for speech recognition (1810.11352v2)

Published 26 Oct 2018 in cs.SD and eess.AS

Abstract: Deep Feedforward Sequential Memory Network (DFSMN) has shown superior performance on speech recognition tasks. Based on this work, we propose a novel network architecture which introduces pyramidal memory structure to represent various context information in different layers. Additionally, res-CNN layers are added in the front to extract more sophisticated features as well. Together with lattice-free maximum mutual information (LF-MMI) and cross entropy (CE) joint training criteria, experimental results show that this approach achieves word error rates (WERs) of 3.62% and 10.89% respectively on Librispeech and LDC97S62 (Switchboard 300 hours) corpora. Furthermore, Recurrent neural network LLM (RNNLM) rescoring is applied and a WER of 2.97% is obtained on Librispeech.

Authors (3)

Xuerui Yang (12 papers)
Jiwei Li (137 papers)
Xi Zhou (43 papers)

Citations (15)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

A novel pyramidal-FSMN architecture with lattice-free MMI for speech recognition (1810.11352v2)

Summary

Related Papers