Knowledge Circuits in Pretrained Transformers (2405.17969v3)

Published 28 May 2024 in cs.CL, cs.AI, cs.CV, cs.IR, and cs.LG

Abstract: The remarkable capabilities of modern LLMs are rooted in their vast repositories of knowledge encoded within their parameters, enabling them to perceive the world and engage in reasoning. The inner workings of how these models store knowledge have long been a subject of intense interest and investigation among researchers. To date, most studies have concentrated on isolated components within these models, such as the Multilayer Perceptrons and attention head. In this paper, we delve into the computation graph of the LLM to uncover the knowledge circuits that are instrumental in articulating specific knowledge. The experiments, conducted with GPT2 and TinyLLAMA, have allowed us to observe how certain information heads, relation heads, and Multilayer Perceptrons collaboratively encode knowledge within the model. Moreover, we evaluate the impact of current knowledge editing techniques on these knowledge circuits, providing deeper insights into the functioning and constraints of these editing methodologies. Finally, we utilize knowledge circuits to analyze and interpret LLM behaviors such as hallucinations and in-context learning. We believe the knowledge circuits hold potential for advancing our understanding of Transformers and guiding the improved design of knowledge editing. Code and data are available in https://github.com/zjunlp/KnowledgeCircuits.

Citations (8)

View on Semantic Scholar

Summary

The paper demonstrates that even small knowledge circuits (under 10% of nodes) can preserve over 70% of transformer performance on factual tasks.
It identifies key components like mover and relation heads that selectively activate with subject and relational queries, informing knowledge editing strategies.
The study links failures in these circuits to hallucination phenomena and reveals dynamic modifications during in-context learning.

Knowledge Circuits in Pretrained Transformers

The paper "Knowledge Circuits in Pretrained Transformers" by Yao et al. investigates the mechanisms by which LLMs, specifically those based on Transformers like GPT2 and TinyLLAMA, store and process knowledge. Central to the exploration are what the authors term "knowledge circuits," which extend our understanding of neural representations within these models beyond isolated components to intricate interplays among multiple computational units.

The authors delineate the computational graph of LLMs, exploring how information heads, relation heads, and Multilayer Perceptrons (MLPs) collaboratively encode and articulate knowledge. Through a series of experiments, this paper aims to trace how specific neural circuitry within these models manages factual knowledge, contextual reasoning, and common knowledge phenomena like hallucinations and in-context learning.

Main Findings

Knowledge Circuits Performance: The paper evaluates the performance of isolated knowledge circuits, revealing that even partial circuits (less than 10% of the model's nodes) can maintain a significant portion (over 70%) of the model's overall performance regarding knowledge tasks. This underscores the robustness of the discovered representations.
Special Components in Knowledge Circuits: The so-called "mover heads" and "relation heads" play crucial roles in handling subject and relational context respectively. Contrary to previous findings, this paper identifies these attention heads as intriguing components differentially activated by distinct types of knowledge-related queries.
Impact of Knowledge Editing: The efficacy of existing knowledge editing techniques such as ROME and fine-tuning on MLP layers is assessed. Notably, these methods mainly involve altered layers directly, revealing altered information flow dynamics, with ROME demonstrating an initially idiosyncratic yet resolute handling of new information.
Understanding Behaviors via Circuits: The paper importantly sheds light on how behaviors like hallucinations may arise from the failure of certain heads ("mover" or "relation") to appropriately transfer knowledge across tokens. It also reveals modifications in knowledge circuits during in-context learning, specifically showing the emergence of new attention heads that relate contextually to past information.

Implications and Future Directions

The implications of this research are both practical and theoretical. Practically, discovering knowledge circuits aids in refining techniques for knowledge editing, offering a more nuanced method to target adjustments needed for bias rectification, misinformation correction, and enhanced reasoning capabilities in neural models. It provides a scaffolding upon which more consistent and accurate model edits could be built, adjusting specific flows of information in response to new facts or erroneous outputs.

Theoretically, this extension of circuit theory into LLMs offers a richer framework for conceptualizing neural knowledge encoding in an integrated manner, harmonizing contributions from both attention and feedforward layers. This points to a potential unified theory of computational cognition within Transformer architecture that mirrors the complex interdependencies observed in human knowledge retrieval and reasoning.

For future research, one of the primary avenues involves refining the granularity of these knowledge circuits to be better understood on neuron-level specificity. Additionally, investigating how these circuits develop during the pre-training phase and how they could be leveraged or modified during fine-tuning could yield insights into improving adaptability and specific task performance of LLMs.

This paper importantly does not position its findings as conclusive but suggests pathways for advancing the potential of neural interpretable learning and editing, guiding more informed designs in model training and adaptation strategies. Such research offers valuable insight into the evolving conversation on the internal mechanisms of AI systems, with potential wide-reaching impact in their adoption in mixed-initiative kognitive workflows and explainable AI systems.

Related Papers

GitHub

GitHub - zjunlp/KnowledgeCircuits: Knowledge Circuits in Pretrained Transformers (64 stars)

Tweets

https://twitter.com/ChenHuajun/status/1841654562744369423

https://twitter.com/GrunerMonzon/status/1841320458026680369

https://twitter.com/teortaxesTex/status/1891697295395110927

https://twitter.com/AhsanTrilogy/status/1868330518435262480

YouTube

Show All Videos