Emergent Mind

Humans and language models diverge when predicting repeating text

(2310.06408)
Published Oct 10, 2023 in cs.CL

Abstract

Language models that are trained on the next-word prediction task have been shown to accurately model human behavior in word prediction and reading speed. In contrast with these findings, we present a scenario in which the performance of humans and LMs diverges. We collected a dataset of human next-word predictions for five stimuli that are formed by repeating spans of text. Human and GPT-2 LM predictions are strongly aligned in the first presentation of a text span, but their performance quickly diverges when memory (or in-context learning) begins to play a role. We traced the cause of this divergence to specific attention heads in a middle layer. Adding a power-law recency bias to these attention heads yielded a model that performs much more similarly to humans. We hope that this scenario will spur future work in bringing LMs closer to human behavior.

Overview

  • The study compares human word prediction to that of the GPT-2 language model, focusing on how each handles repeating text.

  • Humans show slight improvement in predicting repeated text sequences, while GPT-2 rapidly achieves near-perfect performance.

  • Analysis reveals specific attention heads in GPT-2's architecture enable it to excel in recognizing repeating sequences.

  • A novel adjustment to GPT-2, introducing a recency bias, led to more human-like performance but reduced accuracy on new text.

  • The research suggests tailoring LMs to imitate human memory patterns may bring AI closer to reflecting human cognition.

In recently published research, a comparison between human cognitive behavior and the performance of language models (LMs), specifically the GPT-2 model, was conducted in the realm of word prediction. The study centered on a compelling question: Do humans and artificial intelligence diverge when predicting repeating text?

The study first set up an experiment where human participants were asked to predict the next word in sequences that were repeated up to four times. Predictably, human performance on this task improved slightly with each repeat, as familiarity with the text helped refine their predictions.

In stark contrast, the study revealed that the GPT-2 model excelled after just one repetition, achieving nearly perfect performance from therein. This sharp deviation pointed to a fundamental difference in memory mechanisms—the humans relying on relatively fallible short-term memory, and GPT-2 leveraging its capacity to recognize and recall repeated sequences with almost flawless precision.

Upon further analysis, the researchers identified specific attention heads within GPT-2's neural network architecture that facilitated this pattern recognition. These findings throw a spot of doubt on previously held beliefs that LMs mimic human cognitive functions closely.

Seeking to bridge this gap, the researchers introduced a novel method within the model that skewed the attention heads to favor recent information over older data, simulating a form of recency bias akin to human memory patterns. Surprisingly, with this adjustment, the model demonstrated behavior more closely resembling that of the human participants, suggesting that such modifications could make LMs better proxies for human cognition.

However, this human-like performance came at a cost: the LM's overall word-prediction accuracy decreased when applied to non-repeating text. This trade-off revealed an intriguing insight: the LM's exceptional prediction capabilities might be rooted more in its superior memory recall than in its mimicry of human thought processes.

In conclusion, the study not only shed light on the distinctive memory operations in humans and LMs but also proposed potential steps forward. The work implies that, by refining LMs to exhibit behavior similar to human memory patterns, we may advance closer to creating AI that genuinely reflects human cognitive processes. The findings also hint at optimization opportunities in LM design, perhaps leading to more efficient and effective artificial intelligence in the future.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.