Label Words are Anchors: An Information Flow Perspective for Understanding In-Context Learning (2305.14160v4)

Published 23 May 2023 in cs.CL and cs.LG

Abstract: In-context learning (ICL) emerges as a promising capability of LLMs by providing them with demonstration examples to perform diverse tasks. However, the underlying mechanism of how LLMs learn from the provided context remains under-explored. In this paper, we investigate the working mechanism of ICL through an information flow lens. Our findings reveal that label words in the demonstration examples function as anchors: (1) semantic information aggregates into label word representations during the shallow computation layers' processing; (2) the consolidated information in label words serves as a reference for LLMs' final predictions. Based on these insights, we introduce an anchor re-weighting method to improve ICL performance, a demonstration compression technique to expedite inference, and an analysis framework for diagnosing ICL errors in GPT2-XL. The promising applications of our findings again validate the uncovered ICL working mechanism and pave the way for future studies.

Citations (133)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Label Words are Anchors: An Information Flow Perspective for Understanding In-Context Learning (2305.14160v4)

Summary

Related Papers