Emergent Mind

Attention Heads of Large Language Models: A Survey

(2409.03752)
Published Sep 5, 2024 in cs.CL

Abstract

Since the advent of ChatGPT, LLMs have excelled in various tasks but remain largely as black-box systems. Consequently, their development relies heavily on data-driven approaches, limiting performance enhancement through changes in internal architecture and reasoning pathways. As a result, many researchers have begun exploring the potential internal mechanisms of LLMs, aiming to identify the essence of their reasoning bottlenecks, with most studies focusing on attention heads. Our survey aims to shed light on the internal reasoning processes of LLMs by concentrating on the interpretability and underlying mechanisms of attention heads. We first distill the human thought process into a four-stage framework: Knowledge Recalling, In-Context Identification, Latent Reasoning, and Expression Preparation. Using this framework, we systematically review existing research to identify and categorize the functions of specific attention heads. Furthermore, we summarize the experimental methodologies used to discover these special heads, dividing them into two categories: Modeling-Free methods and Modeling-Required methods. Also, we outline relevant evaluation methods and benchmarks. Finally, we discuss the limitations of current research and propose several potential future directions. Our reference list is open-sourced at \url{https://github.com/IAAR-Shanghai/Awesome-Attention-Heads}.

Collaborative mechanism of different attention heads in IOI task.

Overview

  • The paper surveys the mechanisms and functions of attention heads in LLMs using an innovative four-stage cognitive framework to categorize their functions.

  • It discusses how attention heads collaborate across model layers and presents methodologies for uncovering their functions, highlighting evaluation benchmarks and related interpretability topics.

  • The paper identifies limitations in current research and proposes future directions to enhance the interpretability and performance of LLMs.

An Essay on "Attention Heads of LLMs: A Survey"

The paper "Attention Heads of LLMs: A Survey," authored by Zifan Zheng et al., addresses a crucial aspect of the interpretability of LLMs by focusing on the mechanisms and functions of attention heads. As LLMs have gained popularity for their impressive performance across various tasks, understanding the internal workings of these models has become increasingly important. The survey provides a comprehensive overview of the interpretability efforts surrounding attention heads within LLMs, organizing the functions of these heads using an innovative four-stage cognitive framework.

The paper begins by acknowledging the remarkable performance of the Transformer architecture and its applications in NLP tasks. However, the black-box nature of deep neural networks, including LLMs, limits the understanding of their reasoning mechanisms. By shedding light on the internal processes of attention heads, the authors aim to enhance the interpretability of LLMs and improve their performance.

Four-Stage Cognitive Framework

The survey introduces a four-stage framework that mirrors the human thought process: Knowledge Recalling (KR), In-Context Identification (ICI), Latent Reasoning (LR), and Expression Preparation (EP). This framework is used to categorize the functions of specific attention heads and analyze their roles within LLMs' reasoning processes.

Knowledge Recalling (KR):

  • Attention heads in this stage are responsible for retrieving knowledge stored in the model's parameters.
  • Examples include associative memories, constant heads, and memory heads that focus on recalling relevant content based on the context.

In-Context Identification (ICI):

  • This stage focuses on identifying overall structural, syntactic, and semantic information within the context.
  • Attention heads such as positional heads, rare words heads, and duplicate token heads operate in this stage, capturing various types of contextual information.

Latent Reasoning (LR):

  • The LR stage involves synthesizing gathered information and performing logical reasoning.
  • Induction heads, summary readers, and task-specific heads like correct letter heads demonstrate the capabilities of attention heads during this stage.

Expression Preparation (EP):

  • In the EP stage, the results of reasoning are formulated into a verbal expression.
  • Heads such as amplification heads, coherence heads, and mixed heads contribute to aligning the reasoning results with the output expression.

Collaborative Mechanisms and Methodologies

The paper also investigates how attention heads work together across different layers of the model. For instance, in the IOI task, various heads collaborate through the KR, ICI, LR, and EP stages to determine the indirect object in a sentence. This collaboration illustrates the interplay between different heads in achieving alignment between human and model reasoning.

Discovery Methods

To uncover the functions and mechanisms of attention heads, the survey categorizes experimental methodologies into Modeling-Free and Modeling-Required methods. Modeling-Free methods, such as directional addition, zero ablation, and naive activation patching, involve altering the latent states of the model without constructing new models. In contrast, Modeling-Required methods, such as probing and simplified model training, require training new models to explore specific functions.

Evaluation and Additional Topics

The survey highlights benchmarks and datasets used to evaluate the identified mechanisms and overall performance of LLMs, including MMLU, TruthfulQA, and Needle-in-a-Haystack. Additionally, the paper touches on related topics such as FFN interpretability and machine psychology, emphasizing the complementary role of FFNs in the reasoning process and the potential for integrating psychological paradigms to better understand LLMs.

Limitations and Future Directions

The paper acknowledges the limitations in current research, including the focus on specific tasks, the lack of robust collaborative frameworks, and the absence of mathematical proofs for discovered mechanisms. It outlines several future research directions, such as exploring mechanisms in more complex tasks, enhancing robustness against prompts, and integrating insights from machine psychology.

Conclusion

This survey provides a comprehensive overview of the interpretability of attention heads in LLMs, organized within a novel cognitive framework. By categorizing the functions of attention heads and exploring their collaborative mechanisms, the paper contributes to a deeper understanding of LLMs' internal reasoning processes. The recommended future directions and challenges emphasize the need for further research in this domain, ultimately aiming to enhance the interpretability and performance of LLMs in various applications.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.

YouTube