Improving the Transformer Translation Model with Document-Level Context (1810.03581v1)

Published 8 Oct 2018 in cs.CL

Abstract: Although the Transformer translation model (Vaswani et al., 2017) has achieved state-of-the-art performance in a variety of translation tasks, how to use document-level context to deal with discourse phenomena problematic for Transformer still remains a challenge. In this work, we extend the Transformer model with a new context encoder to represent document-level context, which is then incorporated into the original encoder and decoder. As large-scale document-level parallel corpora are usually not available, we introduce a two-step training method to take full advantage of abundant sentence-level parallel corpora and limited document-level parallel corpora. Experiments on the NIST Chinese-English datasets and the IWSLT French-English datasets show that our approach improves over Transformer significantly.

Authors (7)

Jiacheng Zhang (52 papers)
Huanbo Luan (15 papers)
Maosong Sun (337 papers)
Jingfang Xu (11 papers)
Min Zhang (630 papers)
Yang Liu (2253 papers)
Feifei Zhai (9 papers)

Citations (244)

View on Semantic Scholar

Summary

The paper introduces a specialized context encoder to capture document-level dependencies, substantially improving translation coherence.
The paper employs a two-step training process, pre-training on sentence-level corpora before fine-tuning with limited document-level data.
Experimental results on Chinese-English and French-English tasks reveal notable BLEU score improvements, confirming enhanced translation quality.

Improving the Transformer Translation Model with Document-Level Context

The paper explores an extension to the standard Transformer model for Neural Machine Translation (NMT), introducing a method to exploit document-level context, which has been problematic for effective translation. The conventional Transformer architecture, which processes translations on a sentence-by-sentence basis, often struggles with context-dependent phenomena such as coreference and lexical cohesion. This paper proposes a new mechanism to incorporate document-level context into the Transformer model through the use of a specialized context encoder.

Key Contributions and Methodology

The primary contribution is the introduction of a context encoder component, designed to represent document-level context and integrate it into both the encoder and decoder of the Transformer. This is achieved using multi-head self-attention, which efficiently captures long-range dependencies—essential for understanding document-level nuances. Given the scarcity of large-scale document-level parallel corpora, the authors present a two-step training method. This leverages the abundance of sentence-level corpora to first pre-train the sentence-level parameters, which are then fine-tuned using available document-level corpora, thus maintaining the foundations built by the initial training. This nuanced approach allows the model to benefit from context while overcoming the constraint of limited document-level data.

Experimental Evaluation

The methodology is rigorously evaluated on the NIST Chinese-English and IWSLT French-English datasets. Results indicate a substantial improvement in BLEU scores over the baseline Transformer—1.96 points in Chinese-English and 0.89 in French-English translation—highlighting the efficacy of incorporating document-level context. Additionally, the paper contrasts its approach with a cache-based strategy adapted for Transformer, revealing superior performance.

Implications and Future Directions

The advancements presented in this work extend the utility of the Transformer model by enabling more coherent and context-aware translations. This has significant practical implications in translation systems deployed in real-world applications where documents, rather than isolated sentences, are often the unit of translation. Theoretically, the work opens avenues for further exploration of context utilization in Transformer models, suggesting potential expansions beyond translation into other NLP tasks that require context understanding.

Looking ahead, a fruitful area for exploration would be extending this methodology to other language pairs, especially those with typologically diverse structures, to evaluate the robustness of the proposed enhancements. Additionally, integrating this document-level context embedding into unsupervised or low-resource language scenarios could have substantial implications given the growing interest in achieving translation quality with minimal supervision.

In summary, this paper presents a substantive methodological enhancement to the Transformer model, allowing it to effectively leverage document-level context, thereby addressing a notable limitation of sentence-level translation models. The results underscore the importance of context in achieving nuanced and accurate translations, positioning this approach as a valuable advancement in the field of neural machine translation.

PDF Markdown