How do Large Language Models Handle Multilingualism? (2402.18815v3)
Abstract: LLMs have demonstrated impressive capabilities across diverse languages. This study explores how LLMs handle multilingualism. Based on observed language ratio shifts among layers and the relationships between network structures and certain capabilities, we hypothesize the LLM's multilingual workflow ($\texttt{MWork}$): LLMs initially understand the query, converting multilingual inputs into English for task-solving. In the intermediate layers, they employ English for thinking and incorporate multilingual knowledge with self-attention and feed-forward structures, respectively. In the final layers, LLMs generate responses aligned with the original language of the query. To verify $\texttt{MWork}$, we introduce Parallel Language-specific Neuron Detection ($\texttt{PLND}$) to identify activated neurons for inputs in different languages without any labeled data. Using $\texttt{PLND}$, we validate $\texttt{MWork}$ through extensive experiments involving the deactivation of language-specific neurons across various layers and structures. Moreover, $\texttt{MWork}$ allows fine-tuning of language-specific neurons with a small dataset, enhancing multilingual abilities in a specific language without compromising others. This approach results in an average improvement of $3.6\%$ for high-resource languages and $2.3\%$ for low-resource languages across all tasks with just $400$ documents.
- On the cross-lingual transferability of monolingual representations. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 4623–4637.
- Language id in the wild: Unexpected challenges on the path to a thousand-language web text corpus. In Proceedings of the 28th International Conference on Computational Linguistics, pages 6588–6608.
- Cross-lingual ability of multilingual masked language models: A study of language structure. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2022, pages 4702–4712.
- Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality.
- Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311.
- Efficient and effective text encoding for chinese llama and alpaca. arXiv preprint arXiv:2304.08177.
- Knowledge neurons in pretrained transformers. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8493–8502, Dublin, Ireland. Association for Computational Linguistics.
- Knowledge neurons in pretrained transformers. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8493–8502.
- A primer on pretrained multilingual language models. CoRR, abs/2107.00676.
- Interpretability illusions in the generalization of simplified models.
- Transformer feed-forward layers build predictions by promoting concepts in the vocabulary space. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 30–45.
- Transformer feed-forward layers are key-value memories. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 5484–5495.
- Xl-sum: Large-scale multilingual abstractive summarization for 44 languages. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 4693–4703.
- John Hewitt and Percy Liang. 2019. Designing and interpreting probes with control tasks. arXiv preprint arXiv:1909.03368.
- Towards a mechanistic interpretation of multi-step reasoning capabilities of language models. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 4902–4919, Singapore. Association for Computational Linguistics.
- Not all languages are created equal in LLMs: Improving multilingual capability by cross-lingual-thought prompting. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 12365–12394, Singapore. Association for Computational Linguistics.
- Mistral 7b. arXiv preprint arXiv:2310.06825.
- Cross-lingual ability of multilingual BERT: an empirical study. In 8th International Conference on Learning Representations, ICLR 2020.
- Inference-time intervention: Eliciting truthful answers from a language model. arXiv preprint arXiv:2306.03341.
- Enhancing cross-lingual natural language inference by soft prompting with multilingual verbalizer. arXiv preprint arXiv:2305.12761.
- Machine-created universal language for cross-lingual transfer. arXiv preprint arXiv:2305.13071.
- Common sense beyond english: Evaluating and improving multilingual language models for commonsense reasoning. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1274–1287.
- Locating and editing factual associations in gpt. Advances in Neural Information Processing Systems, 35:17359–17372.
- Crosslingual generalization through multitask finetuning. arXiv preprint arXiv:2211.01786.
- Enhancing cross-lingual transfer via phonemic transcription integration. arXiv preprint arXiv:2307.04361.
- OpenAI. 2023. Gpt-4 technical report.
- Pre-trained models for natural language processing: A survey. Science China Technological Sciences, 63(10):1872–1897.
- Pixel representations for multilingual translation and data-efficient cross-lingual transfer. arXiv preprint arXiv:2305.14280.
- The truth is in there: Improving reasoning in language models with layer-selective rank reduction. arXiv preprint arXiv:2312.13558.
- Language models are multilingual chain-of-thought reasoners. In The Eleventh International Conference on Learning Representations.
- A mechanistic interpretation of arithmetic reasoning in language models using causal mediation analysis. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 7035–7052, Singapore. Association for Computational Linguistics.
- Multilingual llms are better cross-lingual in-context learners with alignment. arXiv preprint arXiv:2305.05940.
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
- Attention is all you need. Advances in neural information processing systems, 30.
- Jesse Vig. 2019. A multiscale visualization of attention in the transformer model.
- M3exam: A multilingual, multimodal, multilevel benchmark for examining large language models. CoRR, abs/2306.05179.
- Llama beyond english: An empirical study on language capability transfer.
- Extrapolating large language models to non-english by aligning languages.