Structured World Representations in Maze-Solving Transformers (2312.02566v1)

Published 5 Dec 2023 in cs.LG and cs.AI

Abstract: Transformer models underpin many recent advances in practical machine learning applications, yet understanding their internal behavior continues to elude researchers. Given the size and complexity of these models, forming a comprehensive picture of their inner workings remains a significant challenge. To this end, we set out to understand small transformer models in a more tractable setting: that of solving mazes. In this work, we focus on the abstractions formed by these models and find evidence for the consistent emergence of structured internal representations of maze topology and valid paths. We demonstrate this by showing that the residual stream of only a single token can be linearly decoded to faithfully reconstruct the entire maze. We also find that the learned embeddings of individual tokens have spatial structure. Furthermore, we take steps towards deciphering the circuity of path-following by identifying attention heads (dubbed $\textit{adjacency heads}$), which are implicated in finding valid subsequent tokens.

Citations (2)

View on Semantic Scholar

Summary

The paper demonstrates that small Transformer models internalize maze layouts, as shown by linear decodability of token residual streams.
It utilizes training on both forkless and complex mazes from randomized depth-first search to reveal performance spikes indicative of grokking.
Interpretability analysis identifies specialized adjacency heads that capture pathway structures, suggesting promising avenues for future AI design.

Introduction to Transformers in Maze-Solving

Transformers, originally designed for language processing, are intriguing for their potential across various tasks, including maze-solving. The bulk of research has disassembled these AI networks to grasp their learning mechanisms better. Notably, mechanistic components have been discovered, such as induction heads that support sequence completion. Understanding simple transformers—the starting point of this paper—could offer clues about the learning strategies of their complex counterparts.

Experimenting with Maze Transformers

This paper's investigation involves small transformers learning mazes through a series of tokens. Two primary models were trained: a smaller one on "forkless" mazes and a larger one on complex mazes with multiple decision points. Maze generation covered randomized depth-first search (RDFS) algorithms and its variants to represent a variety of connectivity challenges. The models’ task, essentially predicting the correct token sequence to solve the maze, could be likened to offline reinforcement learning with a textual flair.

Insights from Interpretability Techniques

The paper applied multiple interpretability methods to unpack the transformer models. It turned out that a single token's residual stream could be linearly decoded to reconstruct the maze's layout. This implies an internalized representation within the transformer's hidden layers. Attention analysis revealed specific heads, named "adjacency heads," responsible for understanding the maze's pathways. These findings suggest that even a single layer could possess information about the entire structure of the maze, which is quite a fascinating emergence within an AI's cognition.

Understanding Model Learning and Representation

Probing experiments showed that models indeed learn structured representations of mazes. The paper observed 'grokking'-like spikes, where performance abruptly improved—often coinciding with an enhanced ability to decode maze representations. This hints at a potential causal link between structured understanding and generalization capabilities of transformers.

In essence, the paper propelled forward the understanding of how transformers can abstract complex environmental structures, reflected in their puzzle-solving efficiency. The outcome serves as a springboard for future explorations into whether these representations are a common thread across various network designs and tasks.

PDF Markdown

Related Papers

GitHub

GitHub - understanding-search/structured-representations-maze-transformers: see github.com/understanding-search/maze-transformer (10 stars)

Tweets

YouTube

Show All Videos