Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 37 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 10 tok/s Pro
GPT-5 High 15 tok/s Pro
GPT-4o 84 tok/s Pro
Kimi K2 198 tok/s Pro
GPT OSS 120B 448 tok/s Pro
Claude Sonnet 4 31 tok/s Pro
2000 character limit reached

Machine Translation in the Covid domain: an English-Irish case study for LoResMT 2021 (2403.01196v1)

Published 2 Mar 2024 in cs.CL and cs.AI

Abstract: Translation models for the specific domain of translating Covid data from English to Irish were developed for the LoResMT 2021 shared task. Domain adaptation techniques, using a Covid-adapted generic 55k corpus from the Directorate General of Translation, were applied. Fine-tuning, mixed fine-tuning and combined dataset approaches were compared with models trained on an extended in-domain dataset. As part of this study, an English-Irish dataset of Covid related data, from the Health and Education domains, was developed. The highest-performing model used a Transformer architecture trained with an extended in-domain Covid dataset. In the context of this study, we have demonstrated that extending an 8k in-domain baseline dataset by just 5k lines improved the BLEU score by 27 points.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (15)
  1. Optimizing transformer for low-resource neural machine translation. arXiv preprint arXiv:2011.02266.
  2. Bisong, E. (2019). Google colaboratory. In Building Machine Learning and Deep Learning Models on Google Cloud Platform, pages 59–64. Springer.
  3. An empirical comparison of domain adaptation methods for neural machine translation. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 385–391.
  4. Systran’s pure neural machine translation systems. arXiv preprint arXiv:1610.05540.
  5. Opennmt: Open-source toolkit for neural machine translation. arXiv preprint arXiv:1701.02810.
  6. Sentencepiece: A simple and language independent subword tokenizer and detokenizer for neural text processing. arXiv preprint arXiv:1808.06226.
  7. Transformers for low resource languages: Is feidir linn. In Proceedings of the 18th Conference of the Association for Machine Translation in the Americas (Volume 1: Research Papers).
  8. Findings of the LoResMT 2021 Shared Task on COVID and Sign Language for Low-Resource Languages. In Proceedings of the 4th Workshop on Technologies for MT of Low Resource Languages.
  9. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pages 311–318.
  10. Popović, M. (2015). chrf: character n-gram f-score for automatic mt evaluation. In Proceedings of the Tenth Workshop on Statistical Machine Translation, pages 392–395.
  11. Revisiting low-resource neural machine translation: A case study. arXiv preprint arXiv:1905.11901.
  12. A study of translation edit rate with targeted human annotation. In Proceedings of association for machine translation in the Americas, volume 200. Citeseer.
  13. Dgt-tm: A freely available translation memory in 22 languages. arXiv preprint arXiv:1309.5226.
  14. Attention is all you need. arXiv preprint arXiv:1706.03762.
  15. Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144.
Citations (12)

Summary

We haven't generated a summary for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Lightbulb On Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.