Beyond English-Centric Multilingual Machine Translation

Published 21 Oct 2020 in cs.CL and cs.LG | (2010.11125v1)

Abstract: Existing work in translation demonstrated the potential of massively multilingual machine translation by training a single model able to translate between any pair of languages. However, much of this work is English-Centric by training only on data which was translated from or to English. While this is supported by large sources of training data, it does not reflect translation needs worldwide. In this work, we create a true Many-to-Many multilingual translation model that can translate directly between any pair of 100 languages. We build and open source a training dataset that covers thousands of language directions with supervised data, created through large-scale mining. Then, we explore how to effectively increase model capacity through a combination of dense scaling and language-specific sparse parameters to create high quality models. Our focus on non-English-Centric models brings gains of more than 10 BLEU when directly translating between non-English directions while performing competitively to the best single systems of WMT. We open-source our scripts so that others may reproduce the data, evaluation, and final M2M-100 model.

Abstract PDF Upgrade to Chat

Authors (17)

First 10 authors:

Citations (777)

View on Semantic Scholar

Summary

The paper introduces a many-to-many model that directly translates between 100 languages, eliminating the need for English pivoting.
It employs an extensive dataset of 7.5 billion sentences and innovative data mining strategies to enhance non-English translation quality.
The M2M-100 model achieves over 10 BLEU points improvement in non-English directions, demonstrating a significant advantage over traditional English-centric methods.

Overview of "Beyond English-Centric Multilingual Machine Translation"

The paper, "Beyond English-Centric Multilingual Machine Translation," addresses the limitations of existing multilingual machine translation (MMT) models that predominantly employ English as a pivot language. The research introduces M2M-100, a many-to-many multilingual translation model capable of translating directly between any two languages among 100 possibilities without pivoting through English.

Key Contributions

Dataset Creation: The authors present a large-scale many-to-many training dataset that covers thousands of language pairs through an extensive data mining strategy. This robust dataset comprises 7.5 billion training sentences for 100 languages, significantly enhancing non-English translation directions.
Model Scaling: The paper introduces innovative scaling strategies combining dense scaling and sparse parameters tailored to specific languages. This approach culminates in models with up to 15.4 billion parameters, more than 50 times larger than conventional bilingual models.
Improvement Over English-Centric Models: The M2M-100 model achieves over 10 BLEU points improvement in non-English translation directions compared to English-centric models. The performance remains competitive with the best single-system models on WMT benchmarks while providing superior quality and efficiency in many-to-many translation.

Research Methodology

Data Mining and Backtranslation

The methodology involves a novel data mining strategy that selectively mines language pairs based on linguistic and geographic proximity, termed the Bridge Language Group mining strategy. This strategy erases the computational challenges associated with exhaustive mining of all possible language pairs.

To further bolster the dataset, the researchers utilize backtranslation, generating synthetic data for low-resource language pairs. This technique substantially augments the quality of translations in directions with initially low BLEU scores, demonstrating significant improvements post-backtranslation.

Multilingual Benchmark and Model Architecture

The researchers evaluated their models across diverse publicly available benchmarks, including WMT, WAT, IWSLT, FLORES, TED, Autshumato, and Tatoeba. This comprehensive evaluation ensures that the model's performance is rigorously validated across various domains and translation pairs.

The M2M-100 model leverages state-of-the-art approaches in neural machine translation, including Transformer-based architectures, large embedding dimensions, and subword tokenization with SentencePiece. The use of language-specific parallel layers, re-routing strategies, and model parallelism underpin the model's efficiency and high capacity in multilingual settings.

Numerical Results and Implications

The M2M-100 model's direct translation approach yields impressive numerical results. It outperforms traditional English-centric models by a significant margin in non-English directions. For instance, translating directly between non-English directions yields a BLEU improvement of over 10 points compared to English-pivot methods.

This many-to-many translation model has broad practical implications. It is highly relevant in regions or countries with multiple official languages, facilitating direct communication in native languages without relying on English as an intermediary. Additionally, the model's scalability suggests potential applications in real-time translation services, multilingual content generation, and cross-lingual information retrieval.

Future Directions

The research points to several future avenues, notably improving low-resource language translation through better data mining, incorporation of curated datasets, and continuous refinement of language-specific parameters. Additionally, the paper highlights the potential of integrating domain-specific adaptations and user feedback to further enhance translation quality.

Conclusion

The "Beyond English-Centric Multilingual Machine Translation" paper marks a significant advancement in multilingual translation models. By shifting from an English-centric paradigm to a true many-to-many framework, the research addresses critical gaps in global translation needs. The combination of robust data mining strategies and scalable model architectures showcases promising results, reinforcing the practicality and scalability of the M2M-100 model for diverse multilingual applications.

Markdown Report Issue