Exploring Length Generalization in Large Language Models (2207.04901v2)

Published 11 Jul 2022 in cs.CL and cs.LG

Abstract: The ability to extrapolate from short problem instances to longer ones is an important form of out-of-distribution generalization in reasoning tasks, and is crucial when learning from datasets where longer problem instances are rare. These include theorem proving, solving quantitative mathematics problems, and reading/summarizing novels. In this paper, we run careful empirical studies exploring the length generalization capabilities of transformer-based LLMs. We first establish that naively finetuning transformers on length generalization tasks shows significant generalization deficiencies independent of model scale. We then show that combining pretrained LLMs' in-context learning abilities with scratchpad prompting (asking the model to output solution steps before producing an answer) results in a dramatic improvement in length generalization. We run careful failure analyses on each of the learning modalities and identify common sources of mistakes that highlight opportunities in equipping LLMs with the ability to generalize to longer problems.

Citations (136)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Exploring Length Generalization in Large Language Models (2207.04901v2)

Summary

Related Papers