Emergent Mind

Abstract

LLMs exploit in-context learning (ICL) to solve tasks with only a few demonstrations, but its mechanisms are not yet well-understood. Some works suggest that LLMs only recall already learned concepts from pre-training, while others hint that ICL performs implicit learning over demonstrations. We characterize two ways through which ICL leverages demonstrations. Task recognition (TR) captures the extent to which LLMs can recognize a task through demonstrations -- even without ground-truth labels -- and apply their pre-trained priors, whereas task learning (TL) is the ability to capture new input-label mappings unseen in pre-training. Using a wide range of classification datasets and three LLM families (GPT-3, LLaMA and OPT), we design controlled experiments to disentangle the roles of TR and TL in ICL. We show that (1) models can achieve non-trivial performance with only TR, and TR does not further improve with larger models or more demonstrations; (2) LLMs acquire TL as the model scales, and TL's performance consistently improves with more demonstrations in context. Our findings unravel two different forces behind ICL and we advocate for discriminating them in future ICL research due to their distinct nature.

Overview

  • The paper investigates the mechanisms behind In-context Learning (ICL) in LLMs, particularly differentiating between Task Recognition (TR) and Task Learning (TL).

  • It employs controlled experiments across different model sizes and demonstration numbers within classification tasks to analyze the contributions of TR and TL in LLMs' performance.

  • Findings indicate that TR is significant for LLM performance across scales but does not improve with larger models or more examples. Conversely, TL is more pronounced in larger models and with more demonstrations, showcasing their ability to learn new tasks.

  • The research highlights the importance of understanding TR and TL for deploying LLMs in various tasks, suggesting larger models and more demonstrations are necessary for tasks requiring new learning.

Disentangling Task Recognition and Task Learning in LLMs

Introduction to In-Context Learning in LLMs

In-context learning (ICL) equips LLMs with the capability to adapt to new tasks by presenting a few example pairs from the task within the prompt. This study, led by scholars from Princeton University, addresses the mechanisms by which LLMs perform ICL, specifically distinguishing between task recognition (TR) and task learning (TL). The distinction lies in TR's ability to identify tasks utilizing pre-trained priors without explicit label guidance, and TL's competency in learning new input-label mappings from demonstrations.

Experimental Approach and Findings

The team's methodology involved designing controlled experiments that manipulate the label space across classification datasets and comparing models from three LLM families: GPT-3, LLaMA, and OPT. They employed settings that isolated TR (by providing random labels) and TL (through abstract, semantically void labels), enabling the analysis of how these mechanisms evolve with changes in model size and demonstration numbers.

The results reveal a nuanced landscape where TR contributes significantly to LLM performance across various scales but does not benefit from increased model size or additional demonstrations. Conversely, TL emerges distinctly with larger models and larger demonstration sets, suggesting a fundamental shift in ICL dynamics at scale. Here, larger models with more examples significantly outperform their counterparts in abstract label settings, showcasing their ability to engage in actual task learning beyond mere recognition.

Theoretical and Practical Implications

This exploration into ICL's underlying mechanisms elucidates the dual nature of how LLMs adapt to new tasks. On one side, TR leverages the extensive pre-training to readily recognize and apply known patterns, evidencing LLMs' remarkable ability to utilize prior knowledge. On the other, the emergence of TL at larger scales highlights the capacity of these models to integrate new information, pushing the boundaries of what LLMs can achieve beyond pre-training confines.

Practically, this distinction informs the design and deployment of LLMs for specific tasks: while smaller models may suffice for tasks closely aligned with pre-trained capabilities, leveraging LLMs for novel tasks (requiring genuine learning from examples) necessitates larger models and richer demonstrations.

Forward Look

The findings advocate for a nuanced interpretation of ICL, proposing a model- and demonstration-aware approach to harnessing LLMs' full potential. Future research avenues could explore deeper into the threshold at which TL becomes prominent, the role of domain specificity in TL and TR efficacy, and extending the analysis beyond classification tasks to a broader range of applications. This study, therefore, not only charts a course for understanding the complexities inherent in ICL but also underscores the multifaceted nature of LLMs' learning capabilities, suggesting a roadmap for tailored, efficient, and effective utilization of these powerful models in diverse settings.

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.