Unveiling LLMs: The Evolution of Latent Representations in a Dynamic Knowledge Graph (2404.03623v2)

Published 4 Apr 2024 in cs.CL, cs.AI, and cs.CY

Abstract: LLMs demonstrate an impressive capacity to recall a vast range of factual knowledge. However, understanding their underlying reasoning and internal mechanisms in exploiting this knowledge remains a key research area. This work unveils the factual information an LLM represents internally for sentence-level claim verification. We propose an end-to-end framework to decode factual knowledge embedded in token representations from a vector space to a set of ground predicates, showing its layer-wise evolution using a dynamic knowledge graph. Our framework employs activation patching, a vector-level technique that alters a token representation during inference, to extract encoded knowledge. Accordingly, we neither rely on training nor external models. Using factual and common-sense claims from two claim verification datasets, we showcase interpretability analyses at local and global levels. The local analysis highlights entity centrality in LLM reasoning, from claim-related information and multi-hop reasoning to representation errors causing erroneous evaluation. On the other hand, the global reveals trends in the underlying evolution, such as word-based knowledge evolving into claim-related facts. By interpreting semantics from LLM latent representations and enabling graph-related analyses, this work enhances the understanding of the factual knowledge resolution process.

References (22)

Citations (2)

View on Semantic Scholar

Summary

The paper introduces a novel framework that decodes latent representations into structured factual knowledge via activation patching.
It constructs a temporal knowledge graph to analyze how factual information evolves across model layers during inference.
The analysis uncovers distinct patterns in entity resolution and multi-hop reasoning, paving the way for enhanced model interpretability.

Unveiling the Dynamics of Factual Knowledge in LLMs through Latent Representations

Introduction to the Study

The paper explores the factual knowledge encoded in the latent space of LLMs when challenged with the task of claim verification. It introduces an end-to-end framework that deciphers the latent representations of LLMs into structured factual knowledge and traces its evolution across the model layers using a temporal knowledge graph. Notably, the framework employs activation patching as a novel approach for dynamic intervention in model inference, negating the need for external models or additional training processes.

Understanding the Framework and Methodology

The proposed framework operates by interfacing with a model across its hidden layers during inference, extracting the semantics of factual claims. The process involves several key steps:

Preliminary Prompt Construction: The model receives semantically structured prompts that guide it to process factual claims, aiming to generate outputs as ground predicates (asserted or negated).
Latent Representation Patching: This critical phase manipulates the model's latent representations by replacing the embedding of a designated token with a weighted summary from the source prompt's latent representations. This enables probing of how the encoded knowledge evolves and is manipulated across layers.
Temporal Knowledge Graph Construction: The output predictions, structured as ground predicates, are then translated into a knowledge graph representation, with the model's layers serving as a temporal dimension. This approach facilitates a granular analysis of how factual knowledge transforms throughout the inference process.

Results and Implications

This paper unveils several key findings regarding the latent dynamics of factual knowledge within LLMs. The local interpretability analysis exposes latent errors, ranging from entity resolution to multi-hop reasoning faults. Globally, it reveals distinct patterns of knowledge evolution - entity resolution focus in early layers, comprehensive encoding of factual knowledge about subject entities in middle layers, and a decline in factual expressiveness in the final layers.

Concluding Remarks

This work represents a significant step forward in understanding the internal mechanisms of LLMs, particularly in how they encode, manipulate, and apply factual knowledge. By leveraging a patching-based approach, this framework opens new avenues for probing the under-explored depths of LLMs’ latent spaces, offering insights into their operational dynamics without the need for external intervention. Future research could extend this framework to explore the interaction between larger context sizes and the resolution process of factual knowledge within LLMs.

PDF Markdown

Related Papers

Tweets

https://twitter.com/zehavoc/status/1807922754429091981

https://twitter.com/WGOV/status/1821107092398244263