LLaMA Beyond English: An Empirical Study on Language Capability Transfer (2401.01055v2)
Abstract: In recent times, substantial advancements have been witnessed in LLMs, exemplified by ChatGPT, showcasing remarkable proficiency across a range of complex tasks. However, many mainstream LLMs (e.g. LLaMA) are pretrained on English-dominant corpus, which limits their performance in other non-English languages. In this paper, we focus on how to effectively transfer the capabilities of language generation and following instructions to a non-English language. To answer this question, we conduct an extensive empirical investigation based on LLaMA, accumulating over 1440 GPU hours. We analyze the impact of key factors such as vocabulary extension, further pretraining, and instruction tuning on transfer. To accurately assess the model's level of knowledge, we employ four widely used standardized testing benchmarks: C-Eval, MMLU, AGI-Eval, and GAOKAO-Bench. Furthermore, a comprehensive evaluation of the model's response quality is conducted, considering aspects such as accuracy, fluency, informativeness, logical coherence, and harmlessness, based on LLM-Eval, a benchmarks consisting instruction tasks from 17 diverse categories. Our evaluation results demonstrate that comparable performance to state-of-the-art transfer models can be achieved with less than 1% of the pretraining data, both in terms of knowledge alignment and response quality. Furthermore, the experimental outcomes across the thirteen low-resource languages also exhibit similar trends. We anticipate that the conclusions revealed by the experiments will aid the community in developing non-English LLMs.
- PaLM 2 Technical Report. arXiv:2305.10403.
- On the Cross-lingual Transferability of Monolingual Representations. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 4623–4637. Online: Association for Computational Linguistics.
- Sparks of Artificial General Intelligence: Early experiments with GPT-4. arXiv:2303.12712.
- NusaCrowd: Open Source Initiative for Indonesian NLP Resources. arXiv:2212.09648.
- Multilingual Alignment of Contextual Word Representations. arXiv:2002.03518.
- Towards Making the Most of Multilingual Pretraining for Zero-Shot Neural Machine Translation. arXiv:2110.08547.
- Finding Universal Grammatical Relations in Multilingual BERT. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 5564–5577. Online: Association for Computational Linguistics.
- Training Verifiers to Solve Math Word Problems. CoRR, abs/2110.14168.
- Unsupervised Cross-lingual Representation Learning at Scale. arXiv:1911.02116.
- Emerging Cross-lingual Structure in Pretrained Language Models. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 6022–6034. Online: Association for Computational Linguistics.
- Free Dolly: Introducing the World’s First Truly Open Instruction-Tuned LLM.
- Chinese LLaMA and Alpaca Large Language Models.
- Efficient and Effective Text Encoding for Chinese LLaMA and Alpaca. arXiv:2304.08177.
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), 4171–4186. Minneapolis, Minnesota: Association for Computational Linguistics.
- A Survey on In-context Learning. arXiv:2301.00234.
- Identifying Elements Essential for BERT’s Multilinguality. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 4423–4437. Online: Association for Computational Linguistics.
- Zero-shot cross-lingual transfer language selection using linguistic similarity. Information Processing & Management, 60(3): 103250.
- Measuring Massive Multitask Language Understanding. CoRR, abs/2009.03300.
- LoRA: Low-Rank Adaptation of Large Language Models. CoRR, abs/2106.09685.
- Not All Languages Are Created Equal in LLMs: Improving Multilingual Capability by Cross-Lingual-Thought Prompting. arXiv:2305.07004.
- Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents. In Chaudhuri, K.; Jegelka, S.; Song, L.; Szepesvari, C.; Niu, G.; and Sabato, S., eds., Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, 9118–9147. PMLR.
- C-Eval: A Multi-Level Multi-Discipline Chinese Evaluation Suite for Foundation Models. arXiv:2305.08322.
- BELLE: Be Everyone’s Large Language model Engine. https://github.com/LianjiaTech/BELLE.
- X-FACTR: Multilingual Factual Knowledge Retrieval from Pretrained Language Models. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 5943–5959. Online: Association for Computational Linguistics.
- The State and Fate of Linguistic Diversity and Inclusion in the NLP World. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 6282–6293. Online: Association for Computational Linguistics.
- Gpt-4 passes the bar exam. Available at SSRN 4389233.
- GLUECoS: An Evaluation Benchmark for Code-Switched NLP. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 3575–3585. Online: Association for Computational Linguistics.
- Multilingual Code-Switching for Zero-Shot Cross-Lingual Intent Prediction and Slot Filling. arXiv:2103.07792.
- Bactrian-X : A Multilingual Replicable Instruction-Following Model with Low-Rank Adaptation. arXiv:2305.15011.
- Few-shot Learning with Multilingual Language Models. arXiv:2112.10668.
- Democratizing LLMs for Low-Resource Languages by Leveraging their English Dominant Abilities with Linguistically-Diverse Prompts. arXiv:2306.11372.
- OpenAI. 2022. Introducing ChatGPT.
- OpenLMLab. 2023. Open-Chinese-LLaMA.
- The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only. arXiv:2306.01116.
- How Multilingual is Multilingual BERT? In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 4996–5001. Florence, Italy: Association for Computational Linguistics.
- Linguistic Diversity in Natural Language Processing. Traitement Automatique des Langues, 62(3): 7–11.
- BLOOM: A 176B-Parameter Open-Access Multilingual Language Model. arXiv:2211.05100.
- StabilityAI. 2023. Announcing StableCode.
- Code-Mixing on Sesame Street: Dawn of the Adversarial Polyglots. arXiv:2103.09593.
- Alpaca: A Strong, Replicable Instruction-Following Model.
- Team, I. 2023a. Internlm: A multilingual language model with progressively enhanced capabilities.
- Team, I. 2023b. InternLM: A Multilingual Language Model with Progressively Enhanced Capabilities. https://github.com/InternLM/InternLM-techreport.
- LLaMA: Open and Efficient Foundation Language Models. arXiv:2302.13971.
- Llama 2: Open Foundation and Fine-Tuned Chat Models. arXiv:2307.09288.
- Are Multilingual Models Effective in Code-Switching? arXiv:2103.13309.
- Learning Multilingual Meta-Embeddings for Code-Switching Named Entity Recognition. In Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019), 181–186. Florence, Italy: Association for Computational Linguistics.
- Hierarchical Meta-Embeddings for Code-Switching Named Entity Recognition. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 3541–3547. Hong Kong, China: Association for Computational Linguistics.
- Language Models are Few-shot Multilingual Learners. In Proceedings of the 1st Workshop on Multilingual Representation Learning, 1–15. Punta Cana, Dominican Republic: Association for Computational Linguistics.
- Beto, Bentz, Becas: The Surprising Cross-Lingual Effectiveness of BERT. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 833–844. Hong Kong, China: Association for Computational Linguistics.
- Oolong: Investigating What Makes Crosslingual Transfer Hard with Controlled Studies. arXiv:2202.12312.
- LLMEVAL-1 Chinese Large Language Model Evaluation Phase 1.
- Evaluating the Performance of Large Language Models on GAOKAO Benchmark. arXiv:2305.12474.
- AGIEval: A Human-Centric Benchmark for Evaluating Foundation Models. arXiv:2304.06364.
- Multilingual Machine Translation with Large Language Models: Empirical Results and Analysis. arXiv:2304.04675.