Papers
Topics
Authors
Recent
2000 character limit reached

xCoT: Cross-lingual Instruction Tuning for Cross-lingual Chain-of-Thought Reasoning (2401.07037v1)

Published 13 Jan 2024 in cs.CL and cs.AI

Abstract: Chain-of-thought (CoT) has emerged as a powerful technique to elicit reasoning in LLMs and improve a variety of downstream tasks. CoT mainly demonstrates excellent performance in English, but its usage in low-resource languages is constrained due to poor language generalization. To bridge the gap among different languages, we propose a cross-lingual instruction fine-tuning framework (xCOT) to transfer knowledge from high-resource languages to low-resource languages. Specifically, the multilingual instruction training data (xCOT-INSTRUCT) is created to encourage the semantic alignment of multiple languages. We introduce cross-lingual in-context few-shot learning (xICL)) to accelerate multilingual agreement in instruction tuning, where some fragments of source languages in examples are randomly substituted by their counterpart translations of target languages. During multilingual instruction tuning, we adopt the randomly online CoT strategy to enhance the multilingual reasoning ability of the LLM by first translating the query to another language and then answering in English. To further facilitate the language transfer, we leverage the high-resource CoT to supervise the training of low-resource languages with cross-lingual distillation. Experimental results on previous benchmarks demonstrate the superior performance of xCoT in reducing the gap among different languages, highlighting its potential to reduce the cross-lingual gap.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (44)
  1. Qwen technical report. CoRR, abs/2309.16609.
  2. Crosssum: Beyond english-centric cross-lingual summarization for 1, 500+ language pairs. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2023, Toronto, Canada, July 9-14, 2023, pages 2541–2564. Association for Computational Linguistics.
  3. Breaking language barriers in multilingual mathematical reasoning: Insights and observations. CoRR, abs/2310.20246.
  4. Breaking language barriers in multilingual mathematical reasoning: Insights and observations. arXiv preprint arXiv:2310.20246.
  5. Training verifiers to solve math word problems. CoRR, abs/2110.14168.
  6. Unsupervised cross-lingual representation learning at scale. In ACL 2020, pages 8440–8451.
  7. Alexis Conneau and Guillaume Lample. 2019. Cross-lingual language model pretraining. In NeurIPS 2019, pages 7057–7067.
  8. GLM: general language model pretraining with autoregressive blank infilling. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2022, Dublin, Ireland, May 22-27, 2022, pages 320–335. Association for Computational Linguistics.
  9. Multilingual clinical NER: translation or cross-lingual transfer? In Proceedings of the 5th Clinical Natural Language Processing Workshop, ClinicalNLP@ACL 2023, Toronto, Canada, July 14, 2023, pages 289–311. Association for Computational Linguistics.
  10. OWL: A large language model for IT operations. CoRR, abs/2309.09298.
  11. Large language models are reasoning teachers. arXiv preprint arXiv:2212.10071.
  12. Large language models are zero-shot reasoners. In NeurIPS.
  13. Large language models are zero-shot reasoners. Advances in neural information processing systems, 35:22199–22213.
  14. Wizardmath: Empowering mathematical reasoning for large language models via reinforced evol-instruct. arXiv preprint arXiv:2308.09583.
  15. XLM-T: scaling up multilingual machine translation with pretrained cross-lingual transformer encoders. CoRR, abs/2012.15547.
  16. Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-sne. JMLR, 9(Nov):2579–2605.
  17. Crosslingual generalization through multitask finetuning. arXiv preprint arXiv:2211.01786.
  18. OpenAI. 2023. GPT-4 technical report. CoRR, abs/2303.08774.
  19. Training language models to follow instructions with human feedback. In NeurIPS.
  20. Bidirectional language models are also few-shot learners. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net.
  21. Are NLP models really able to solve simple math word problems? In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2021, Online, June 6-11, 2021, pages 2080–2094. Association for Computational Linguistics.
  22. Cross-lingual prompting: Improving zero-shot chain-of-thought reasoning across languages. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023, Singapore, December 6-10, 2023, pages 2695–2709. Association for Computational Linguistics.
  23. Bloom: A 176b-parameter open-access multilingual language model. arXiv preprint arXiv:2211.05100.
  24. Language models are multilingual chain-of-thought reasoners. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net.
  25. Multilingual neural machine translation with language clustering. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3-7, 2019, pages 963–973. Association for Computational Linguistics.
  26. Llama: Open and efficient foundation language models. CoRR, abs/2302.13971.
  27. Llama 2: Open foundation and fine-tuned chat models. CoRR, abs/2307.09288.
  28. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
  29. Understanding translationese in cross-lingual summarization. In Findings of the Association for Computational Linguistics: EMNLP 2023, Singapore, December 6-10, 2023, pages 3837–3849. Association for Computational Linguistics.
  30. Finetuned language models are zero-shot learners. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net.
  31. Emergent abilities of large language models. Transactions on Machine Learning Research. Survey Certification.
  32. Chain-of-thought prompting elicits reasoning in large language models. In NeurIPS.
  33. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837.
  34. Unitrans : Unifying model transfer and data transfer for cross-lingual named entity recognition with unlabeled data. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI 2020, pages 3926–3932. ijcai.org.
  35. CROP: zero-shot cross-lingual named entity recognition with multilingual labeled sequence translation. In Findings of EMNLP 2022, pages 486–496.
  36. GanLM: Encoder-decoder pre-training with an auxiliary discriminator. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 9394–9412, Toronto, Canada. Association for Computational Linguistics.
  37. Alternating language modeling for cross-lingual pre-training. In AAAI 2020, pages 9386–9393.
  38. High-resource language-specific training for multilingual neural machine translation. In IJCAI 2022, pages 4461–4467.
  39. UM4: unified multilingual multiple teacher-student model for zero-resource neural machine translation. In IJCAI 2022, pages 4454–4460.
  40. Scaling relationship on learning mathematical reasoning with large language models. arXiv preprint arXiv:2308.01825.
  41. Mammoth: Building math generalist models through hybrid instruction tuning. arXiv preprint arXiv:2309.05653.
  42. Multimodal chain-of-thought reasoning in language models. CoRR, abs/2302.00923.
  43. Conner: Consistency training for cross-lingual named entity recognition. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022, pages 8438–8449. Association for Computational Linguistics.
  44. Solving math word problems via cooperative reasoning induced language models. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 4471–4485, Toronto, Canada. Association for Computational Linguistics.
Citations (41)

Summary

  • The paper introduces xCoT, integrating cross-lingual instruction tuning with chain-of-thought reasoning to enhance multilingual NLP.
  • It employs a novel Random-CoT strategy and cross-lingual in-context few-shot learning to bridge high-resource and low-resource languages.
  • Experimental results show up to a 15% performance improvement on benchmarks like MGSM, underscoring its superior reasoning capabilities.

xCoT: Cross-lingual Instruction Tuning for Cross-lingual Chain-of-Thought Reasoning

The paper "xCoT: Cross-lingual Instruction Tuning for Cross-lingual Chain-of-Thought Reasoning" introduces a novel approach to employ chain-of-thought (CoT) reasoning across multiple languages using cross-lingual instruction fine-tuning. This technique seeks to bridge the performance gap between high-resource languages, such as English, and low-resource languages, enhancing multilingual NLP tasks.

Introduction to Cross-lingual CoT

Chain-of-thought prompts have proven effective in eliciting reasoning within LLMs to solve complex tasks. However, CoT primarily exhibits success in high-resource languages, neglecting multilingual settings where performance significantly drops. This paper proposes xCoT, a framework designed to harness cross-lingual instruction tuning (xICL) for generating coherent language representations and facilitating knowledge transfer from resource-rich to resource-poor languages. Figure 1

Figure 1: Illustration of xCoT. The cross-lingual instruction tuning is used to align representations of different languages.

xCoT Framework

The xCoT framework leverages cross-lingual instruction data, referred to as xCoT-Instruct, encouraging semantic alignment among multiple languages. It comprises a unique processing approach for multilingual alignment: cross-lingual in-context few-shot learning (xICL). This technique mixes language tokens within example queries to enhance multilingual reasoning capabilities. Furthermore, the framework employs a randomly online CoT strategy (Random-CoT) that translates queries into other languages, promoting multilingual reasoning and enabling answers in English. Figure 2

Figure 2: Overview of xCoT. The cross-lingual in-context few-shot learning (xICL) encourages multilingual alignment in instruction tuning, where the query in the example is mixed with different language tokens. During multilingual instruction tuning, the randomly online CoT strategy (Random-CoT) is used to promote the multilingual reasoning ability of LLM and then answer in English. Finally, we leverage the high-resource CoT to supervise the training of low-resource languages with cross-lingual distillation.

Methodology

Cross-lingual Instruction Tuning

The xCoT-Instruct dataset is developed by translating English instruction data into various target languages, supplemented by cross-lingual in-context examples. The approach involves swapping snippets of the source language in examples with translated tokens of target languages. During multilingual instruction tuning, Random-CoT asks the model to convert queries into different languages prior to generating responses in English. The enhancement of reasoning in non-English queries is further supported through cross-lingual distillation of high-resource CoT outcomes.

Implementation Details

The paper implements xCoT using Llama-2 and Bloom models, experimenting with a variety of embeddings and language datasets. The framework requires fine-tuning of LLMs on a mix of xCoT-Instruct data, leveraging the described Random-CoT strategy for consistent linguistic performance across languages.

Evaluation and Results

xCoT's performance is evaluated on multilingual benchmarks like MGSM and MSVAMP, involving languages like German, Chinese, and Spanish. The experimental results demonstrate state-of-the-art performance with significant improvements over baseline models, achieving an average margin enhancement of 15%. The strategically interactive fine-tuning showcased xCoT's capacity to elevate low-resource language understanding via strategic multilingual instruction data synthesis and execution. Figure 3

Figure 3

Figure 3: (a) and (b) are representations of Llama-7B and our method from the last decoder layer. Each color denotes one language (11 languages in MGSM).

Analysis and Future Directions

Analysis of the results emphasizes the effectiveness of cross-lingual transfer through xCoT, proving valuable across different linguistic scenarios. Evaluations reflect the model's adaptation and success when handling multiple languages with distinct reasoning paths. Future directions could explore more robust training datasets and the integration of additional resource-rich languages to enhance the practical applications of xCoT in global communicative interfaces.

Conclusion

This paper presents xCoT, an innovative cross-lingual framework that integrates instruction tuning and CoT reasoning, successfully reducing performance disparities across multilingual NLP frameworks. The results confirm the advantage of adopting a fine-tuned multilingual strategy, propelling LLMs towards superior cross-linguistic reasoning capabilities. The framework lays a strong foundation for advancing multilingual AI beyond traditional language limitations.

Whiteboard

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.