Adapting Large Language Models for Document-Level Machine Translation (2401.06468v4)

Published 12 Jan 2024 in cs.CL

Abstract: LLMs have significantly advanced various NLP tasks. Recent research indicates that moderately-sized LLMs often outperform larger ones after task-specific fine-tuning. This study focuses on adapting LLMs for document-level machine translation (DocMT) for specific language pairs. We first investigate the impact of prompt strategies on translation performance and then conduct extensive experiments using two fine-tuning methods, three LLM backbones, and 18 translation tasks across nine language pairs. Our results show that specialized models can sometimes surpass GPT-4 in translation performance but still face issues like off-target translation due to error propagation in decoding. We provide an in-depth analysis of these LLMs tailored for DocMT, examining translation errors, discourse phenomena, strategies for training and inference, the data efficiency of parallel documents, recent test set evaluations, and zero-shot crosslingual transfer. Our findings highlight the strengths and limitations of LLM-based DocMT models and provide a foundation for future research.

References (76)

Authors (5)

Minghao Wu (31 papers)
Thuy-Trang Vu (23 papers)
Lizhen Qu (68 papers)
George Foster (24 papers)
Gholamreza Haffari (141 papers)

Citations (30)

View on Semantic Scholar

Summary

The paper demonstrates how fine-tuning methods like FFT and PEFT enhance translation quality while reducing data requirements.
It reveals that fine-tuned LLMs can outperform larger models like GPT-4 in specific scenarios and generalize better on out-of-domain texts.
It identifies challenges such as off-target translations and emphasizes the importance of effective prompting strategies for improved accuracy.

Introduction to LLMs in Document-Level Translation

The potential of LLMs in the field of NLP has been demonstrated consistently across a variety of applications, carving out an impressive track record in tasks such as text generation, summarization, and question-answering. In the specific arena of document-level machine translation (DocMT), which seeks to maintain the context and coherence across sentences in a document during translation, these models have presented remarkable but sometimes inconsistent results. This summary explores extensive research conducted to adapt LLMs for DocMT across multiple language pairs, focusing on the comparative performance of differently sized models with variations in fine-tuning techniques.

Exploring Fine-Tuning Strategies for Translation

Moderately-sized LLMs, those containing around 7 billion parameters, were meticulously fine-tuned using two approaches: Parameter-Efficient Fine-Tuning (PEFT) and Fully Fine-Tuning (FFT). These methods were assessed through an array of metrics designed to accurately gauge translation quality. Despite their exceptional performance on some tasks, LLMs still assorted challenges such as the production of "off-target" translations, wherein the output would be in an incorrect language. Moreover, the paper digs into the vital role of prompting strategies during the fine-tuning phase, revealing that certain prompt structures can significantly enhance LLM capabilities in translation tasks.

Key Findings in Translation Performance

The investigation bore fruit in several key findings when it compared the translation proficiency of LLMs against other state-of-the-art models. The paper found that fine-tuned LLMs can surpass the translation abilities of even GPT-4, one of the largest available models, in certain tasks. However, success is selective, and in other scenarios, these same models failed completely due to off-target translation issues. Remarkably, the smaller, fine-tuned LLMs displayed fewer errors when performance metrics were aligned with larger models. Additionally, the fine-tuning methods have shown different efficiency levels; for instance, the FFT method required only about 1% of the full dataset to match the performance achieved with the whole set, while PEFT needed 10%.

Advancements and Implications for DocMT

The research implications extend to how LLMs are compared to traditional document-level machine translations. When evaluated on recently created test sets, the fine-tuned LLMs manifested better generalization on out-of-domain text compared to conventional DocMT models. Furthermore, the paper found that base LLMs supplemented with task-specific supervised fine-tuning exhibit superior zero-shot cross-lingual transfer capabilities over instruction-tuned LLMs.

The compelling evidence suggests that fine-tuning LLMs on parallel documents can unlock sophisticated translation abilities, thereby improving DocMT models distinctively. These models become particularly advantageous for tasks involving low-resource languages, potentially redefining translation approaches for diverse language pairs. The paper sets a solid foundation for ongoing research and development in the field of machine translation, signposting the journey towards more refined, contextually aware, and accurate translation systems.

Tweets

https://twitter.com/WuMinghao_nlp/status/1746857824414777479

https://twitter.com/fly51fly/status/1747014286965416073