Emergent Mind

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

(2201.11903)
Published Jan 28, 2022 in cs.CL and cs.AI

Abstract

We explore how generating a chain of thought -- a series of intermediate reasoning steps -- significantly improves the ability of LLMs to perform complex reasoning. In particular, we show how such reasoning abilities emerge naturally in sufficiently LLMs via a simple method called chain of thought prompting, where a few chain of thought demonstrations are provided as exemplars in prompting. Experiments on three LLMs show that chain of thought prompting improves performance on a range of arithmetic, commonsense, and symbolic reasoning tasks. The empirical gains can be striking. For instance, prompting a 540B-parameter language model with just eight chain of thought exemplars achieves state of the art accuracy on the GSM8K benchmark of math word problems, surpassing even finetuned GPT-3 with a verifier.

Large language models use chain-of-thought prompting for complex arithmetic and symbolic reasoning tasks.

Overview

  • Examines how 'chain-of-thought' prompting aids LLMs in complex reasoning tasks.

  • Reveals the methodology of using intermediate steps in prompts to break down problems into solvable components.

  • Shows empirical evidence of increased accuracy in tasks such as arithmetic and commonsense reasoning with this method.

  • Demonstrates the effectiveness of chain-of-thought prompting is magnified with larger language models.

  • Suggests that standard prompting may underestimate LLM potential and advanced prompts could significantly enhance capabilities.

Chain-of-Thought Prompting

Introduction

The paper under review examines how LLMs, specifically those geared towards tasks necessitating complex reasoning, can benefit from an innovative prompting strategy known as "chain-of-thought" prompting. This methodology diverges from standard prompts by incorporating clear intermediate reasoning steps, ostensibly improving the model's ability to unravel more intricate problems that would normally present significant challenges. A bridge is formed between the linguistic complexity of the prompts and the reasoning capabilities of large models such as PaLM 540B.

The Methodology

With chain-of-thought prompting, the input consists of a triplet: a problem description, a series of natural language intermediate reasoning steps, and an output. These intermediate steps mirror the thought process a human might employ when tackling a complex problem. The paper's key insight is that these chains of thought do not serve simply as a means to enhance interpretability; rather, they systematically improve the model's capability to approach problems by breaking them down into solvable components.

Empirical Findings

The empirical results presented in the paper establish the efficacy of chain-of-thought prompting across several benchmarks. For arithmetic reasoning on the GSM8K benchmark, a notable increase in accuracy is identified when employing chain-of-thought prompting versus standard prompting, particularly with the 540B parameter language model. Comparable improvements are observed in both commonsense and symbolic reasoning tasks, substantiating the method's potency. Notably, chain-of-thought prompting becomes more effective alongside increasing model scale, highlighting a critical interaction between the method and model size.

In modeling experiments, chain-of-thought prompting also exhibits resilience across different exemplars, annotators, and prompt permutations, albeit with expected variability. The approach consistently outperforms the standard prompting method irrespective of these factors.

Conclusions and Implications

Overall, this research positions chain-of-thought prompting as a promising tool for enhancing LLMs' reasoning capabilities. By integrating simple, natural language intermediate steps, the models can effectively tackle a range of problems that require higher-order thinking. This finding suggests that standard prompting underestimates the true potential of LLMs and that model capabilities can be significantly unlocked through appropriate prompting techniques. While acknowledging imperfections and areas for future inquiry, such as improving factuality in reasoning and broadening task applicability, the paper concludes that chain-of-thought prompting is a strategic step towards realizing the full potential of LLMs in complex reasoning tasks.

Subscribe by Email

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.

YouTube