Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
143 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Integrating Pre-Trained Speech and Language Models for End-to-End Speech Recognition (2312.03668v2)

Published 6 Dec 2023 in eess.AS, cs.AI, cs.CL, and cs.LG

Abstract: Advances in machine learning have made it possible to perform various text and speech processing tasks, such as automatic speech recognition (ASR), in an end-to-end (E2E) manner. E2E approaches utilizing pre-trained models are gaining attention for conserving training data and resources. However, most of their applications in ASR involve only one of either a pre-trained speech or a LLM. This paper proposes integrating a pre-trained speech representation model and a LLM for E2E ASR. The proposed model enables the optimization of the entire ASR process, including acoustic feature extraction and acoustic and LLMing, by combining pre-trained models with a bridge network and also enables the application of remarkable developments in LLM utilization, such as parameter-efficient domain adaptation and inference optimization. Experimental results demonstrate that the proposed model achieves a performance comparable to that of modern E2E ASR models by utilizing powerful pre-training models with the proposed integrated approach.

Citations (4)

Summary

We haven't generated a summary for this paper yet.