Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 58 tok/s
Gemini 2.5 Pro 52 tok/s Pro
GPT-5 Medium 12 tok/s Pro
GPT-5 High 17 tok/s Pro
GPT-4o 95 tok/s Pro
Kimi K2 179 tok/s Pro
GPT OSS 120B 463 tok/s Pro
Claude Sonnet 4 38 tok/s Pro
2000 character limit reached

Bayesian Learning of LF-MMI Trained Time Delay Neural Networks for Speech Recognition (2012.04494v3)

Published 8 Dec 2020 in eess.AS

Abstract: Discriminative training techniques define state-of-the-art performance for automatic speech recognition systems. However, they are inherently prone to overfitting, leading to poor generalization performance when using limited training data. In order to address this issue, this paper presents a full Bayesian framework to account for model uncertainty in sequence discriminative training of factored TDNN acoustic models. Several Bayesian learning based TDNN variant systems are proposed to model the uncertainty over weight parameters and choices of hidden activation functions, or the hidden layer outputs. Efficient variational inference approaches using a few as one single parameter sample ensure their computational cost in both training and evaluation time comparable to that of the baseline TDNN systems. Statistically significant word error rate (WER) reductions of 0.4%-1.8% absolute (5%-11% relative) were obtained over a state-of-the-art 900 hour speed perturbed Switchboard corpus trained baseline LF-MMI factored TDNN system using multiple regularization methods including F-smoothing, L2 norm penalty, natural gradient, model averaging and dropout, in addition to i-Vector plus learning hidden unit contribution (LHUC) based speaker adaptation and RNNLM rescoring. Consistent performance improvements were also obtained on a 450 hour HKUST conversational Mandarin telephone speech recognition task. On a third cross domain adaptation task requiring rapidly porting a 1000 hour LibriSpeech data trained system to a small DementiaBank elderly speech corpus, the proposed Bayesian TDNN LF-MMI systems outperformed the baseline system using direct weight fine-tuning by up to 2.5\% absolute WER reduction.

Citations (17)

Summary

We haven't generated a summary for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Lightbulb On Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube