Translatotron 3: Speech to Speech Translation with Monolingual Data (2305.17547v3)

Published 27 May 2023 in cs.CL, cs.LG, cs.SD, and eess.AS

Abstract: This paper presents Translatotron 3, a novel approach to unsupervised direct speech-to-speech translation from monolingual speech-text datasets by combining masked autoencoder, unsupervised embedding mapping, and back-translation. Experimental results in speech-to-speech translation tasks between Spanish and English show that Translatotron 3 outperforms a baseline cascade system, reporting $18.14$ BLEU points improvement on the synthesized Unpaired-Conversational dataset. In contrast to supervised approaches that necessitate real paired data, or specialized modeling to replicate para-/non-linguistic information such as pauses, speaking rates, and speaker identity, Translatotron 3 showcases its capability to retain it. Audio samples can be found at http://google-research.github.io/lingvo-lab/translatotron3

Citations (11)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

GitHub

Translatotron 3: Speech to Speech Translation with Monolingual Data

Tweets

https://twitter.com/maxcrisp/status/1780007559543365908

https://twitter.com/Juicecountyeth/status/1789909530920300932

https://twitter.com/Juicecountyeth/status/1789910210590458213

YouTube

Show All Videos

Translatotron 3: Speech to Speech Translation with Monolingual Data (2305.17547v3)

Summary

Related Papers

GitHub

Tweets

YouTube