Emergent Mind

Unveiling LLMs: The Evolution of Latent Representations in a Temporal Knowledge Graph

(2404.03623)
Published Apr 4, 2024 in cs.CL , cs.AI , and cs.CY

Abstract

LLMs demonstrate an impressive capacity to recall a vast range of common factual knowledge information. However, unravelling the underlying reasoning of LLMs and explaining their internal mechanisms of exploiting this factual knowledge remain active areas of investigation. Our work analyzes the factual knowledge encoded in the latent representation of LLMs when prompted to assess the truthfulness of factual claims. We propose an end-to-end framework that jointly decodes the factual knowledge embedded in the latent space of LLMs from a vector space to a set of ground predicates and represents its evolution across the layers using a temporal knowledge graph. Our framework relies on the technique of activation patching which intervenes in the inference computation of a model by dynamically altering its latent representations. Consequently, we neither rely on external models nor training processes. We showcase our framework with local and global interpretability analyses using two claim verification datasets: FEVER and CLIMATE-FEVER. The local interpretability analysis exposes different latent errors from representation to multi-hop reasoning errors. On the other hand, the global analysis uncovered patterns in the underlying evolution of the model's factual knowledge (e.g., store-and-seek factual information). By enabling graph-based analyses of the latent representations, this work represents a step towards the mechanistic interpretability of LLMs.

A framework decodes factual knowledge from LLMs into temporal knowledge graphs via patching.

Overview

  • The study introduces a framework to understand how LLMs encode and evolve factual knowledge using latent representations and a temporal knowledge graph.

  • The framework introduces activation patching, allowing dynamic intervention in LLM inference to trace factual knowledge evolution without extra models or training.

  • Through the framework, the study finds patterns of knowledge encoding: initial focus on entity resolution, mid-layer comprehensive factual encoding, and a late-layer decline in factual expressiveness.

  • This research provides insights into the operational dynamics of LLMs, revealing both local errors in reasoning and global patterns of knowledge evolution.

Unveiling the Dynamics of Factual Knowledge in LLMs through Latent Representations

Introduction to the Study

The study explores the factual knowledge encoded in the latent space of LLMs when challenged with the task of claim verification. It introduces an end-to-end framework that deciphers the latent representations of LLMs into structured factual knowledge and traces its evolution across the model layers using a temporal knowledge graph. Notably, the framework employs activation patching as a novel approach for dynamic intervention in model inference, negating the need for external models or additional training processes.

Understanding the Framework and Methodology

The proposed framework operates by interfacing with a model across its hidden layers during inference, extracting the semantics of factual claims. The process involves several key steps:

  • Preliminary Prompt Construction: The model receives semantically structured prompts that guide it to process factual claims, aiming to generate outputs as ground predicates (asserted or negated).
  • Latent Representation Patching: This critical phase manipulates the model's latent representations by replacing the embedding of a designated token with a weighted summary from the source prompt's latent representations. This enables probing of how the encoded knowledge evolves and is manipulated across layers.
  • Temporal Knowledge Graph Construction: The output predictions, structured as ground predicates, are then translated into a knowledge graph representation, with the model's layers serving as a temporal dimension. This approach facilitates a granular analysis of how factual knowledge transforms throughout the inference process.

Results and Implications

This study unveils several key findings regarding the latent dynamics of factual knowledge within LLMs. The local interpretability analysis exposes latent errors, ranging from entity resolution to multi-hop reasoning faults. Globally, it reveals distinct patterns of knowledge evolution - entity resolution focus in early layers, comprehensive encoding of factual knowledge about subject entities in middle layers, and a decline in factual expressiveness in the final layers.

Concluding Remarks

This work represents a significant step forward in understanding the internal mechanisms of LLMs, particularly in how they encode, manipulate, and apply factual knowledge. By leveraging a patching-based approach, this framework opens new avenues for probing the under-explored depths of LLMs’ latent spaces, offering insights into their operational dynamics without the need for external intervention. Future research could extend this framework to explore the interaction between larger context sizes and the resolution process of factual knowledge within LLMs.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.