Adapting Large Language Models for Document-Level Machine Translation (2401.06468v4)
Abstract: LLMs have significantly advanced various NLP tasks. Recent research indicates that moderately-sized LLMs often outperform larger ones after task-specific fine-tuning. This study focuses on adapting LLMs for document-level machine translation (DocMT) for specific language pairs. We first investigate the impact of prompt strategies on translation performance and then conduct extensive experiments using two fine-tuning methods, three LLM backbones, and 18 translation tasks across nine language pairs. Our results show that specialized models can sometimes surpass GPT-4 in translation performance but still face issues like off-target translation due to error propagation in decoding. We provide an in-depth analysis of these LLMs tailored for DocMT, examining translation errors, discourse phenomena, strategies for training and inference, the data efficiency of parallel documents, recent test set evaluations, and zero-shot crosslingual transfer. Our findings highlight the strengths and limitations of LLM-based DocMT models and provide a foundation for future research.
- Palm 2 technical report. CoRR, abs/2305.10403.
- Llemma: An open language model for mathematics. CoRR, abs/2310.10631.
- Evaluating discourse phenomena in neural machine translation. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 1304–1313, New Orleans, Louisiana. Association for Computational Linguistics.
- Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics, 5:135–146.
- Language models are few-shot learners. In Advances in Neural Information Processing Systems, volume 33, pages 1877–1901. Curran Associates, Inc.
- Aljoscha Burchardt. 2013. Multidimensional quality metrics: a flexible system for assessing translation quality. In Proceedings of Translating and the Computer 35, London, UK. Aslib.
- Overview of the IWSLT 2017 evaluation campaign. In Proceedings of the 14th International Conference on Spoken Language Translation, pages 2–14, Tokyo, Japan. International Workshop on Spoken Language Translation.
- On the off-target problem of zero-shot multilingual neural machine translation. In Findings of the Association for Computational Linguistics: ACL 2023, pages 9542–9558, Toronto, Canada. Association for Computational Linguistics.
- Monolingual or multilingual instruction tuning: Which makes a better alpaca. CoRR, abs/2309.08958.
- Palm: Scaling language modeling with pathways. CoRR, abs/2204.02311.
- Scaling instruction-finetuned language models. CoRR, abs/2210.11416.
- No language left behind: Scaling human-centered machine translation. CoRR, abs/2207.04672.
- Learn to remember: Transformer with recurrent memory for document-level machine translation. In Findings of the Association for Computational Linguistics: NAACL 2022, pages 1409–1420, Seattle, United States. Association for Computational Linguistics.
- Results of WMT22 metrics shared task: Stop using BLEU – neural metrics are better and more robust. In Proceedings of the Seventh Conference on Machine Translation (WMT), pages 46–68, Abu Dhabi, United Arab Emirates (Hybrid). Association for Computational Linguistics.
- Textbooks are all you need. CoRR, abs/2306.11644.
- How good are GPT models at machine translation? A comprehensive evaluation. CoRR, abs/2302.09210.
- Lora: Low-rank adaptation of large language models. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net.
- Diving deep into context-aware neural machine translation. In Proceedings of the Fifth Conference on Machine Translation, pages 604–616, Online. Association for Computational Linguistics.
- Mistral 7b. CoRR, abs/2310.06825.
- Is chatgpt A good translator? A preliminary study. CoRR, abs/2301.08745.
- Tom Kocmi and Christian Federmann. 2023. GEMBA-MQM: Detecting translation quality error spans with GPT-4. In Proceedings of the Eighth Conference on Machine Translation, pages 768–775, Singapore. Association for Computational Linguistics.
- Proceedings of the Eighth Conference on Machine Translation. Association for Computational Linguistics, Singapore.
- Large language models are zero-shot reasoners. In NeurIPS.
- MADLAD-400: A multilingual and document-level large audited dataset. CoRR, abs/2309.04662.
- Bactrian-x : A multilingual replicable instruction-following model with low-rank adaptation. CoRR, abs/2305.15011.
- Multilingual denoising pre-training for neural machine translation. Transactions of the Association for Computational Linguistics, 8:726–742.
- Chain-of-dictionary prompting elicits translation in large language models. CoRR, abs/2305.06575.
- Wizardcoder: Empowering code large language models with evol-instruct. CoRR, abs/2306.08568.
- A simple and effective unified encoder for document-level machine translation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 3505–3511, Online. Association for Computational Linguistics.
- Valentin Macé and Christophe Servan. 2019. Using whole document context in neural machine translation. In Proceedings of the 16th International Conference on Spoken Language Translation, Hong Kong. Association for Computational Linguistics.
- When less is more: Investigating data pruning for pretraining llms at scale. CoRR, abs/2309.04564.
- Sameen Maruf and Gholamreza Haffari. 2018. Document context neural machine translation with memory networks. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1275–1284, Melbourne, Australia. Association for Computational Linguistics.
- Selective attention for context-aware neural machine translation. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 3092–3102, Minneapolis, Minnesota. Association for Computational Linguistics.
- A survey on document-level neural machine translation: Methods and evaluation. ACM Comput. Surv., 54(2):45:1–45:36.
- Document-level neural machine translation with hierarchical attention networks. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2947–2954, Brussels, Belgium. Association for Computational Linguistics.
- Cross-task generalization via natural language crowdsourcing instructions. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 3470–3487, Dublin, Ireland. Association for Computational Linguistics.
- Adaptive machine translation with large language models. In Proceedings of the 24th Annual Conference of the European Association for Machine Translation, pages 227–237, Tampere, Finland. European Association for Machine Translation.
- Crosslingual generalization through multitask finetuning. CoRR, abs/2211.01786.
- Culturax: A cleaned, enormous, and multilingual dataset for large language models in 167 languages. CoRR, abs/2309.09400.
- Data augmentation by concatenation for low-resource translation: A mystery and a solution. In Proceedings of the 18th International Conference on Spoken Language Translation (IWSLT 2021), pages 287–293, Bangkok, Thailand (online). Association for Computational Linguistics.
- OpenAI. 2023. GPT-4 technical report. CoRR, abs/2303.08774.
- Training language models to follow instructions with human feedback. In NeurIPS.
- Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pages 311–318, Philadelphia, Pennsylvania, USA. Association for Computational Linguistics.
- Matt Post. 2018. A call for clarity in reporting BLEU scores. In Proceedings of the Third Conference on Machine Translation: Research Papers, pages 186–191, Brussels, Belgium. Association for Computational Linguistics.
- COMET: A neural framework for MT evaluation. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 2685–2702, Online. Association for Computational Linguistics.
- ChatGPT MT: Competitive for high- (but not low-) resource languages. In Proceedings of the Eighth Conference on Machine Translation, pages 392–418, Singapore. Association for Computational Linguistics.
- Multitask prompted training enables zero-shot task generalization. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net.
- BLOOM: A 176b-parameter open-access multilingual language model. CoRR, abs/2211.05100.
- Flan-moe: Scaling instruction-finetuned language models with sparse mixture of experts. CoRR, abs/2305.14705.
- Rethinking document-level neural machine translation. In Findings of the Association for Computational Linguistics: ACL 2022, pages 3537–3548, Dublin, Ireland. Association for Computational Linguistics.
- Jörg Tiedemann and Yves Scherrer. 2017. Neural machine translation with extended context. In Proceedings of the Third Workshop on Discourse in Machine Translation, pages 82–92, Copenhagen, Denmark. Association for Computational Linguistics.
- Llama: Open and efficient foundation language models. CoRR, abs/2302.13971.
- Llama 2: Open foundation and fine-tuned chat models. CoRR, abs/2307.09288.
- Learning to remember translation history with a continuous cache. Transactions of the Association for Computational Linguistics, 6:407–420.
- When a good translation is wrong in context: Context-aware machine translation improves on deixis, ellipsis, and lexical cohesion. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 1198–1212, Florence, Italy. Association for Computational Linguistics.
- Context-aware neural machine translation learns anaphora resolution. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1264–1274, Melbourne, Australia. Association for Computational Linguistics.
- A survey on zero pronoun translation. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 3325–3339, Toronto, Canada. Association for Computational Linguistics.
- Document-level machine translation with large language models. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 16646–16661, Singapore. Association for Computational Linguistics.
- Exploiting cross-sentence context for neural machine translation. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 2826–2831, Copenhagen, Denmark. Association for Computational Linguistics.
- Super-NaturalInstructions: Generalization via declarative instructions on 1600+ NLP tasks. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 5085–5109, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
- Finetuned language models are zero-shot learners. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net.
- Chain-of-thought prompting elicits reasoning in large language models. In NeurIPS.
- Learning from task descriptions. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1361–1375, Online. Association for Computational Linguistics.
- Contextual neural machine translation improves translation of cataphoric pronouns. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 5971–5978, Online. Association for Computational Linguistics.
- Minghao Wu and Alham Fikri Aji. 2023. Style over substance: Evaluation biases for large language models. CoRR, abs/2307.03025.
- Document flattening: Beyond concatenating context for document-level neural machine translation. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pages 448–462, Dubrovnik, Croatia. Association for Computational Linguistics.
- Lamini-lm: A diverse herd of distilled models from large-scale instructions. CoRR, abs/2304.14402.
- A paradigm shift in machine translation: Boosting translation performance of large language models. CoRR, abs/2309.11674.
- mT5: A massively multilingual pre-trained text-to-text transformer. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 483–498, Online. Association for Computational Linguistics.
- Bigtrans: Augmenting large language models with multilingual translation capability over 100 languages. CoRR, abs/2305.18098.
- Multilingual document-level translation enables zero-shot transfer from sentences to documents. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 4176–4192, Dublin, Ireland. Association for Computational Linguistics.
- Improving massively multilingual neural machine translation and zero-shot translation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 1628–1639, Online. Association for Computational Linguistics.
- Improving the transformer translation model with document-level context. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 533–542, Brussels, Belgium. Association for Computational Linguistics.
- Long-short term masking transformer: A simple but effective baseline for document-level neural machine translation. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1081–1087, Online. Association for Computational Linguistics.
- Bayling: Bridging cross-lingual alignment and instruction following through interactive translation for large language models. CoRR, abs/2306.10968.
- Multilingual machine translation with large language models: Empirical results and analysis. CoRR, abs/2304.04675.
- Minghao Wu (31 papers)
- Thuy-Trang Vu (23 papers)
- Lizhen Qu (68 papers)
- George Foster (24 papers)
- Gholamreza Haffari (141 papers)