Emergent Mind

Larger language models do in-context learning differently

(2303.03846)
Published Mar 7, 2023 in cs.CL

Abstract

We study how in-context learning (ICL) in language models is affected by semantic priors versus input-label mappings. We investigate two setups-ICL with flipped labels and ICL with semantically-unrelated labels-across various model families (GPT-3, InstructGPT, Codex, PaLM, and Flan-PaLM). First, experiments on ICL with flipped labels show that overriding semantic priors is an emergent ability of model scale. While small language models ignore flipped labels presented in-context and thus rely primarily on semantic priors from pretraining, large models can override semantic priors when presented with in-context exemplars that contradict priors, despite the stronger semantic priors that larger models may hold. We next study semantically-unrelated label ICL (SUL-ICL), in which labels are semantically unrelated to their inputs (e.g., foo/bar instead of negative/positive), thereby forcing language models to learn the input-label mappings shown in in-context exemplars in order to perform the task. The ability to do SUL-ICL also emerges primarily with scale, and large-enough language models can even perform linear classification in a SUL-ICL setting. Finally, we evaluate instruction-tuned models and find that instruction tuning strengthens both the use of semantic priors and the capacity to learn input-label mappings, but more of the former.

Overview of flipped-label and semantically-unrelated label inductive conformal learning methods.

Overview

  • The paper investigates how varying sizes of language models adapt to in-context learning (ICL) by either relying on semantic priors or learning new input-to-label mappings directly from examples.

  • Larger models demonstrate an ability to override semantic priors when faced with flipped labels, suggesting a scale-dependent capability to adapt to new mappings.

  • In setups where labels have no semantic relation to inputs, larger models outperform smaller ones by establishing new mappings, indicating a size-related emergence of learning flexibility.

  • The study finds that instruction-tuned models perform better in learning new mappings but struggle more to override existing semantic priors, highlighting a nuanced interplay between semantic reliance and novel mapping acquisition.

Exploring the Impact of Semantic Priors and Input-Label Mappings on In-Context Learning Across Model Scales

Overview of In-Context Learning Variations

The study analyzes how larger language models adapt to in-context learning (ICL) utilizing semantic priors versus input-label mappings. Experiments engage with two distinctive ICL setups: Flipped-Label ICL and Semantically-Unrelated Label ICL (SUL-ICL), across an assortment of model families (GPT-3, InstructGPT, Codex, PaLM, and Flan-PaLM). The key inquiry revolves around whether the models rely more on semantic priors internalized during pretraining or if they can learn to map inputs to labels directly from the presented exemplars.

Findings on Semantic Priors Override

Investigation into Flipped-Label ICL highlights an emergent capability among large models to override ingrained semantic priors. This phenomenon manifests when models encounter flipped labels in the context exemplars—they adapt their outputs in accordance with new mappings, a capability that amplified with model scale. Contrastingly, smaller models exhibited resilience to change, adhering to their pretraining-informed semantic priors.

Semantically-Unrelated Label ICL Insights

In the realm of SUL-ICL, where models confront labels devoid of semantic connection to the inputs, it was discerned that large models withstand the absence of semantic priors more efficiently than smaller counterparts. This capability is indicative of large models' adeptness in forging new input-label mappings absent any reliance on pretraining-induced semantic understanding. The study unveils that for certain tasks, exceeding random guessing accuracy in the SUL-ICL setting necessitates substantial model scaling, pointing towards an emergent property tied to model size.

Instruction Tuning and ICL

Instruction-tuned models exhibit enhanced performance in SUL-ICL setups, suggesting an increased propensity for learning input-label mappings from exemplars. However, such models demonstrated a decreased ability to disregard or override semantic priors in the Flipped-Label ICL tests. This underscores a duality where instruction tuning amplifies both the grasp on semantic priors and the capability to learn new mappings, with a stronger inclination towards utilizing semantic priors.

Implications on High-Dimensional Linear Classification

The study extends its examination to high-dimensional linear classification tasks, where it emerges that successful task execution without semantic priors becomes feasible at certain model scales. This observation underscores the breadth of in-context learning's applicability, transcending NLP tasks to include abstract, symbolic reasoning challenges.

Concluding Remarks

The exploration delineates the nuanced dynamics between semantic priors utilization and the learning of novel input-label mappings within the framework of in-context learning across different model scales. The revelation of emergent phenomena—as models upscale—underscores a critical leap toward more versatile and adaptable language models. It accentuates the imperative for ongoing research to unravel the evolving capabilities of language models as they scale, paving the path towards AI systems with more profound syntactic and conceptual understanding, capable of transcending traditional reliance on pretraining-imparted knowledge.

Subscribe by Email

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.