Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Self-Training for End-to-End Speech Translation (2006.02490v2)

Published 3 Jun 2020 in cs.CL, cs.SD, and eess.AS

Abstract: One of the main challenges for end-to-end speech translation is data scarcity. We leverage pseudo-labels generated from unlabeled audio by a cascade and an end-to-end speech translation model. This provides 8.3 and 5.7 BLEU gains over a strong semi-supervised baseline on the MuST-C English-French and English-German datasets, reaching state-of-the art performance. The effect of the quality of the pseudo-labels is investigated. Our approach is shown to be more effective than simply pre-training the encoder on the speech recognition task. Finally, we demonstrate the effectiveness of self-training by directly generating pseudo-labels with an end-to-end model instead of a cascade model.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Juan Pino (51 papers)
  2. Qiantong Xu (26 papers)
  3. Xutai Ma (23 papers)
  4. Mohammad Javad Dousti (17 papers)
  5. Yun Tang (42 papers)
Citations (56)

Summary

We haven't generated a summary for this paper yet.