Emergent Abilities of Large Language Models (2206.07682v2)

Published 15 Jun 2022 in cs.CL

Abstract: Scaling up LLMs has been shown to predictably improve performance and sample efficiency on a wide range of downstream tasks. This paper instead discusses an unpredictable phenomenon that we refer to as emergent abilities of LLMs. We consider an ability to be emergent if it is not present in smaller models but is present in larger models. Thus, emergent abilities cannot be predicted simply by extrapolating the performance of smaller models. The existence of such emergence implies that additional scaling could further expand the range of capabilities of LLMs.

Citations (2,022)

View on Semantic Scholar

Summary

The paper shows that increasing model scale leads to sudden, qualitative performance jumps that defy linear extrapolation.
Methodologies like prompt-based tasks and chain-of-thought techniques highlight how larger LLMs unlock novel capabilities.
The findings suggest that advanced architectures or improved training methods may further reveal emergent behaviors in language models.

Emergent Abilities of LLMs

Introduction

LLMs have seen remarkable progress in recent years. A fascinating phenomenon observed in these models, particularly those of a larger scale, is the development of unexpected capabilities—known as emergent abilities. These abilities, interestingly, do not manifest in smaller models but begin to appear in larger versions, presenting a performance trend defying simple extrapolations from their less sizable counterparts. Emergence in this context is defined as qualitative changes in behavior originating from quantitative increases in a system—in this case, the size of the LLM as gauged by the number of parameters and computational resources expended during training.

Emergent Abilities Defined

Emergence in LLMs is evident when there's a significant leap in model performance that transcends the predictable gains seen with smaller models. A distinct attribute of emergent abilities is their phase transition-like nature. Initially, the model's performance on a task may randomize as if the model lacks the ability entirely. Then, past a certain model scale threshold, performance sharply increases. This behavior is akin to phase transitions in physics where a substantial change in state reveals non-trivial properties that were not foreseeable. Notably, most densely built Transformer LLMs follow this trend since they usually scale their computational training resources in proportion to model parameters.

Observations in Prompt-Based Tasks

The unpredictability of emergent abilities is particularly striking in prompting paradigms, where an LLM produces responses based on predefined inputs without further training modifications. A prime example is the response improvement in few-shot prompted tasks that LLMs like PaLM and GPT-3 exhibit only after reaching extremely high numbers of parameters and computational training. These improvements were recorded across a battery of such tasks, from arithmetic to transliteration, indicating a broad spectrum of emergent abilities.

Augmented Prompting Techniques

Besides the raw scaling of LLMs, researchers have also investigated various enhanced prompting and fine-tuning strategies, which may qualify as emergent abilities if they are detrimental or show no effect until applied at a certain scale. Examples include chain-of-thought prompting, which facilitates multi-step reasoning, and scratchpad methodologies that assist with sequential operations. Techniques for model calibration have also been observed to be effective only at higher scales.

Conclusion

The research frontier for LLMs includes identifying the limits of their emergent abilities, especially since these capabilities challenge our current understanding of model predictability. The premise is that with additional scaling, even more sophisticated abilities may emerge. However, achieving emergence might also be possible without simply increasing model scale, potentially through improved architectures, training methods, data quality, or tasks that emphasize current model weaknesses. These findings heighten the need for the computational linguistics community to explore the causality and dynamics of LLMs' emergent behaviors.

Related Papers

Tweets

https://twitter.com/imbue_ai/status/1836853358046454256

https://twitter.com/RishiBommasani/status/1888030366163943755

https://twitter.com/KenjiTakano4/status/1934311287703679454

https://twitter.com/Jpaarhuis/status/1789567317611008475

https://twitter.com/AzamHussai70792/status/1775707810786910277

https://twitter.com/SchoeneggerPhil/status/1747281496367370507

YouTube

Show All Videos

HackerNews

Emergent Abilities of Large Language Models (2 points, 0 comments)
Emergent Abilities of Large Language Models (2022) (1 point, 0 comments)