Language-Specific Neurons: The Key to Multilingual Capabilities in Large Language Models

Published 26 Feb 2024 in cs.CL | (2402.16438v2)

Abstract: LLMs demonstrate remarkable multilingual capabilities without being pre-trained on specially curated multilingual parallel corpora. It remains a challenging problem to explain the underlying mechanisms by which LLMs process multilingual texts. In this paper, we delve into the composition of Transformer architectures in LLMs to pinpoint language-specific regions. Specially, we propose a novel detection method, language activation probability entropy (LAPE), to identify language-specific neurons within LLMs. Based on LAPE, we conduct comprehensive experiments on several representative LLMs, such as LLaMA-2, BLOOM, and Mistral. Our findings indicate that LLMs' proficiency in processing a particular language is predominantly due to a small subset of neurons, primarily situated in the models' top and bottom layers. Furthermore, we showcase the feasibility to "steer" the output language of LLMs by selectively activating or deactivating language-specific neurons. Our research provides important evidence to the understanding and exploration of the multilingual capabilities of LLMs.

Abstract PDF HTML Upgrade to Chat

Authors (8)

References (52)

Citations (36)

View on Semantic Scholar

Summary

The paper presents the LAPE method to identify neurons with specific language activation, quantifying their role in enhancing multilingual processing.
The experimental design used deactivation studies on models like LLaMA-2 and BLOOM, showing significant performance drops in language modeling and generation.
Findings reveal that key language-specific neurons are primarily in the bottom and top layers, enabling controlled multilingual outputs and insights into language dominance.

The paper "Language-Specific Neurons: The Key to Multilingual Capabilities in LLMs" (2402.16438) addresses the challenge of understanding how LLMs process multilingual texts without explicit multilingual parallel corpora pre-training. The research introduces a methodology to identify language-specific neurons within Transformer architectures, offering insights into the compositional mechanisms that underpin multilingual capabilities.

LAPE Methodology for Identifying Language-Specific Neurons

The cornerstone of the paper is the Language Activation Probability Entropy (LAPE) method. This technique is designed to pinpoint neurons that exhibit a strong preference for activation by specific languages. The LAPE calculation involves feeding multilingual corpora into the LLM and observing the activation probabilities of individual neurons. For each neuron, the LAPE score is computed to quantify its language activation reaction. A low LAPE score indicates that a neuron is language-specific, exhibiting a pronounced preference for activation in response to one or a small set of languages. Mathematically, the LAPE score can be expressed as:

$LAPE_i = - \sum_{l=1}^{L} P(a_{i,l}) log(P(a_{i,l}))$

Where $LAPE_i$ represents the LAPE score for the $i$ -th neuron, $L$ is the total number of languages, and $P(a_{i,l})$ is the activation probability of the $i$ -th neuron in response to the $l$ -th language. Neurons demonstrating minimal entropy in their activation probabilities across languages are flagged as language-specific.

Experimental Design and Evaluation Metrics

The study employed LLaMA-2 (7B, 13B, and 70B) and BLOOM (7.1B) to evaluate the impact of language-specific neurons on multilingual capabilities. The evaluation encompassed two primary tasks: language modeling and open-ended generation. Language modeling performance was assessed using perplexity (PPL) scores on Wikipedia corpora. The open-ended generation task utilized a translated version of the Vicuna dataset, with GPT-4 serving as the judge to evaluate the quality of the generated text. Ablation studies were conducted, involving the deactivation of identified language-specific neurons, and the resulting performance degradation was measured. Alternative identification methods, including Language Activation Value Entropy (LAVE), Parameter Variation (PV) based on monolingual instruction tuning, and Random Selection (RS), were used for comparison.

Key Findings: Location and Impact of Language-Specific Neurons

The experimental results indicated that a small proportion of neurons exert a disproportionately large influence on an LLM's ability to process a specific language. Deactivating these neurons resulted in a significant decline in both understanding and generation capabilities for the targeted language. The analysis revealed that language-specific neurons are predominantly located in the bottom and top layers of LLMs. The bottom layers are responsible for processing inputs from different languages into a unified semantic space, while the top layers project the semantic content into the corresponding vocabulary of each language. Specifically, the impact of deactivating language-specific neurons was quantified through perplexity increases and GPT-4 based quality scores, demonstrating a tangible reduction in language processing proficiency.

Steering LLM Outputs Through Neuron Manipulation

The paper demonstrated the potential for controlling the output language of LLMs by selectively activating or deactivating language-specific neurons. This was achieved by manually activating language-specific neurons, increasing their activation value to the average for that language, which increased the likelihood that the model would respond in the language of the prompt. Additionally, cross-lingual generation was achieved by deactivating neurons associated with the source language and activating neurons associated with the target language, resulting in responses generated in the desired target language, even when prompted in a different language.

Language Dominance and Resource Allocation

The analysis uncovered a dominance relationship between high-resource languages (e.g., English in LLaMA-2) and low-resource languages. This suggests that low-resource languages are aligned with high-resource languages within the model's representation space, which has implications for transfer learning and cross-lingual adaptation strategies. This dominance was observed through the degree of overlap in language-specific neuron activation patterns, with high-resource languages exhibiting more distinct and robust activation profiles.

In summary, the study provides a detailed examination of language-specific neurons in LLMs, offering insights into their location, impact, and potential for manipulating multilingual outputs. The LAPE method and the experimental findings contribute to a deeper understanding of how LLMs achieve multilingual capabilities.

Markdown Report Issue