Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 41 tok/s
Gemini 2.5 Pro 46 tok/s Pro
GPT-5 Medium 21 tok/s Pro
GPT-5 High 20 tok/s Pro
GPT-4o 91 tok/s Pro
Kimi K2 178 tok/s Pro
GPT OSS 120B 474 tok/s Pro
Claude Sonnet 4 38 tok/s Pro
2000 character limit reached

Walia-LLM: Enhancing Amharic-LLaMA by Integrating Task-Specific and Generative Datasets (2402.08015v5)

Published 12 Feb 2024 in cs.CL

Abstract: LLMs have received a lot of attention in NLP research because of their exceptional performance in understanding and generating human languages. However, low-resource languages are left behind due to the unavailability of resources. In this work, we focus on enhancing the LLaMA-2-Amharic model by integrating task-specific and generative datasets to improve LLM performance for Amharic. We compile an Amharic instruction fine-tuning dataset and fine-tuned LLaMA-2-Amharic model. The fine-tuned model shows promising results in different NLP tasks. We open-source our dataset creation pipeline, instruction datasets, trained models, and evaluation outputs to promote language-specific studies on these models.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (39)
  1. Amqa: Amharic question answering dataset. arXiv preprint arXiv:2303.03290.
  2. Masakhaner: Named entity recognition for african languages. Transactions of the Association for Computational Linguistics, 9:1116–1131.
  3. Masakhanews: News topic classification for african languages. arXiv preprint arXiv:2304.09972.
  4. Findings of the 2019 conference on machine translation (WMT19). In Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1), pages 1–61, Florence, Italy. Association for Computational Linguistics.
  5. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
  6. Palm: Scaling language modeling with pathways. Journal of Machine Learning Research, 24(240):1–113.
  7. Free dolly: Introducing the world’s first truly open instruction-tuned llm.
  8. Efficient and effective text encoding for chinese llama and alpaca. arXiv preprint arXiv:2304.08177.
  9. Qlora: Efficient finetuning of quantized llms. arXiv preprint arXiv:2305.14314.
  10. How abilities in large language models are affected by supervised fine-tuning data composition. arXiv preprint arXiv:2310.05492.
  11. Beyond english-centric multilingual machine translation. Journal of Machine Learning Research, 22(107):1–48.
  12. Xl-sum: Large-scale multilingual abstractive summarization for 44 languages. arXiv preprint arXiv:2106.13822.
  13. Measuring massive multitask language understanding. arXiv preprint arXiv:2009.03300.
  14. Training compute-optimal large language models. arXiv preprint arXiv:2203.15556.
  15. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685.
  16. iocuydi. 2024. llama-2-amharic-3784m (revision 04fcac9).
  17. Phi-2: The surprising power of small language models.
  18. Mistral 7b. arXiv preprint arXiv:2310.06825.
  19. Mixtral of experts. arXiv preprint arXiv:2401.04088.
  20. KPQA: A metric for generative question answering using keyphrase weights. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2105–2115, Online. Association for Computational Linguistics.
  21. Chin-Yew Lin. 2004. ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out, pages 74–81, Barcelona, Spain. Association for Computational Linguistics.
  22. Edward Ma. 2019. Nlp augmentation. https://github.com/makcedward/nlpaug.
  23. Fine-tuning large language models for adaptive machine translation. arXiv preprint arXiv:2312.12740.
  24. Afrisenti: A twitter sentiment analysis benchmark for african languages. arXiv preprint arXiv:2302.08956.
  25. No language left behind: Scaling human-centered machine translation.
  26. Maja Popović. 2017. chrf++: words helping character n-grams. In Proceedings of the second conference on machine translation, pages 612–618.
  27. Matt Post. 2018. A call for clarity in reporting bleu scores. arXiv preprint arXiv:1804.08771.
  28. Code llama: Open foundation models for code. arXiv preprint arXiv:2308.12950.
  29. Toolformer: Language models can teach themselves to use tools. arXiv preprint arXiv:2302.04761.
  30. Stanford alpaca: An instruction-following llama model. https://github.com/tatsu-lab/stanford_alpaca.
  31. Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805.
  32. Natural language processing in Ethiopian languages: Current state, challenges, and opportunities. In Proceedings of the Fourth workshop on Resources for African Indigenous Languages (RAIL 2023), pages 126–139, Dubrovnik, Croatia. Association for Computational Linguistics.
  33. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
  34. Self-instruct: Aligning language model with self generated instructions. arXiv preprint arXiv:2212.10560.
  35. Finetuned language models are zero-shot learners. arXiv preprint arXiv:2109.01652.
  36. Bloom: A 176b-parameter open-access multilingual language model. arXiv preprint arXiv:2211.05100.
  37. A paradigm shift in machine translation: Boosting translation performance of large language models. arXiv preprint arXiv:2309.11674.
  38. Video-llama: An instruction-tuned audio-visual language model for video understanding. arXiv preprint arXiv:2306.02858.
  39. Llama-adapter: Efficient fine-tuning of language models with zero-init attention. arXiv preprint arXiv:2303.16199.
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-Up Questions

We haven't generated follow-up questions for this paper yet.