Attention Heads of Large Language Models: A Survey (2409.03752v3)

Published 5 Sep 2024 in cs.CL

Abstract: Since the advent of ChatGPT, LLMs have excelled in various tasks but remain as black-box systems. Understanding the reasoning bottlenecks of LLMs has become a critical challenge, as these limitations are deeply tied to their internal architecture. Among these, attention heads have emerged as a focal point for investigating the underlying mechanics of LLMs. In this survey, we aim to demystify the internal reasoning processes of LLMs by systematically exploring the roles and mechanisms of attention heads. We first introduce a novel four-stage framework inspired by the human thought process: Knowledge Recalling, In-Context Identification, Latent Reasoning, and Expression Preparation. Using this framework, we comprehensively review existing research to identify and categorize the functions of specific attention heads. Additionally, we analyze the experimental methodologies used to discover these special heads, dividing them into two categories: Modeling-Free and Modeling-Required methods. We further summarize relevant evaluation methods and benchmarks. Finally, we discuss the limitations of current research and propose several potential future directions.

Citations (9)

View on Semantic Scholar

Summary

The paper introduces a novel four-stage cognitive framework (KR, ICI, LR, EP) to clarify the distinct roles of attention heads in LLMs.
The paper employs both Modeling-Free and Modeling-Required methods to reveal how attention heads interact and contribute to model reasoning.
The paper outlines limitations and future research directions, emphasizing the need for robust interpretability frameworks and integration with machine psychology.

An Essay on "Attention Heads of LLMs: A Survey"

The paper "Attention Heads of LLMs: A Survey," authored by Zifan Zheng et al., addresses a crucial aspect of the interpretability of LLMs by focusing on the mechanisms and functions of attention heads. As LLMs have gained popularity for their impressive performance across various tasks, understanding the internal workings of these models has become increasingly important. The survey provides a comprehensive overview of the interpretability efforts surrounding attention heads within LLMs, organizing the functions of these heads using an innovative four-stage cognitive framework.

The paper begins by acknowledging the remarkable performance of the Transformer architecture and its applications in NLP tasks. However, the black-box nature of deep neural networks, including LLMs, limits the understanding of their reasoning mechanisms. By shedding light on the internal processes of attention heads, the authors aim to enhance the interpretability of LLMs and improve their performance.

Four-Stage Cognitive Framework

The survey introduces a four-stage framework that mirrors the human thought process: Knowledge Recalling (KR), In-Context Identification (ICI), Latent Reasoning (LR), and Expression Preparation (EP). This framework is used to categorize the functions of specific attention heads and analyze their roles within LLMs' reasoning processes.

Knowledge Recalling (KR):
- Attention heads in this stage are responsible for retrieving knowledge stored in the model's parameters.
- Examples include associative memories, constant heads, and memory heads that focus on recalling relevant content based on the context.
In-Context Identification (ICI):
- This stage focuses on identifying overall structural, syntactic, and semantic information within the context.
- Attention heads such as positional heads, rare words heads, and duplicate token heads operate in this stage, capturing various types of contextual information.
Latent Reasoning (LR):
- The LR stage involves synthesizing gathered information and performing logical reasoning.
- Induction heads, summary readers, and task-specific heads like correct letter heads demonstrate the capabilities of attention heads during this stage.
Expression Preparation (EP):
- In the EP stage, the results of reasoning are formulated into a verbal expression.
- Heads such as amplification heads, coherence heads, and mixed heads contribute to aligning the reasoning results with the output expression.

Collaborative Mechanisms and Methodologies

The paper also investigates how attention heads work together across different layers of the model. For instance, in the IOI task, various heads collaborate through the KR, ICI, LR, and EP stages to determine the indirect object in a sentence. This collaboration illustrates the interplay between different heads in achieving alignment between human and model reasoning.

Discovery Methods

To uncover the functions and mechanisms of attention heads, the survey categorizes experimental methodologies into Modeling-Free and Modeling-Required methods. Modeling-Free methods, such as directional addition, zero ablation, and naive activation patching, involve altering the latent states of the model without constructing new models. In contrast, Modeling-Required methods, such as probing and simplified model training, require training new models to explore specific functions.

Evaluation and Additional Topics

The survey highlights benchmarks and datasets used to evaluate the identified mechanisms and overall performance of LLMs, including MMLU, TruthfulQA, and Needle-in-a-Haystack. Additionally, the paper touches on related topics such as FFN interpretability and machine psychology, emphasizing the complementary role of FFNs in the reasoning process and the potential for integrating psychological paradigms to better understand LLMs.

Limitations and Future Directions

The paper acknowledges the limitations in current research, including the focus on specific tasks, the lack of robust collaborative frameworks, and the absence of mathematical proofs for discovered mechanisms. It outlines several future research directions, such as exploring mechanisms in more complex tasks, enhancing robustness against prompts, and integrating insights from machine psychology.

Conclusion

This survey provides a comprehensive overview of the interpretability of attention heads in LLMs, organized within a novel cognitive framework. By categorizing the functions of attention heads and exploring their collaborative mechanisms, the paper contributes to a deeper understanding of LLMs' internal reasoning processes. The recommended future directions and challenges emphasize the need for further research in this domain, ultimately aiming to enhance the interpretability and performance of LLMs in various applications.