Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 134 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 27 tok/s Pro
GPT-5 High 29 tok/s Pro
GPT-4o 119 tok/s Pro
Kimi K2 180 tok/s Pro
GPT OSS 120B 418 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

Towards a parallel corpus of Portuguese and the Bantu language Emakhuwa of Mozambique (2104.05753v1)

Published 12 Apr 2021 in cs.CL and cs.AI

Abstract: Major advancement in the performance of machine translation models has been made possible in part thanks to the availability of large-scale parallel corpora. But for most languages in the world, the existence of such corpora is rare. Emakhuwa, a language spoken in Mozambique, is like most African languages low-resource in NLP terms. It lacks both computational and linguistic resources and, to the best of our knowledge, few parallel corpora including Emakhuwa already exist. In this paper we describe the creation of the Emakhuwa-Portuguese parallel corpus, which is a collection of texts from the Jehovah's Witness website and a variety of other sources including the African Story Book website, the Universal Declaration of Human Rights and Mozambican legal documents. The dataset contains 47,415 sentence pairs, amounting to 699,976 word tokens of Emakhuwa and 877,595 word tokens in Portuguese. After normalization processes which remain to be completed, the corpus will be made freely available for research use.

Citations (6)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.