Emergent Mind

Self-Consistency Improves Chain of Thought Reasoning in Language Models

(2203.11171)
Published Mar 21, 2022 in cs.CL and cs.AI

Abstract

Chain-of-thought prompting combined with pre-trained LLMs has achieved encouraging results on complex reasoning tasks. In this paper, we propose a new decoding strategy, self-consistency, to replace the naive greedy decoding used in chain-of-thought prompting. It first samples a diverse set of reasoning paths instead of only taking the greedy one, and then selects the most consistent answer by marginalizing out the sampled reasoning paths. Self-consistency leverages the intuition that a complex reasoning problem typically admits multiple different ways of thinking leading to its unique correct answer. Our extensive empirical evaluation shows that self-consistency boosts the performance of chain-of-thought prompting with a striking margin on a range of popular arithmetic and commonsense reasoning benchmarks, including GSM8K (+17.9%), SVAMP (+11.0%), AQuA (+12.2%), StrategyQA (+6.4%) and ARC-challenge (+3.9%).

A method enhancing language model outputs with diverse reasoning paths for consistent final answers.

Overview

  • The paper introduces self-consistency, a novel method to enhance reasoning in LLMs by generating multiple reasoning paths and aggregating them to find the most consistent answer.

  • Self-consistency diverges from traditional greedy decoding by employing a generation-across-marginalization strategy, acting as a 'self-ensemble' approach without the need for extra training or annotations.

  • Empirically tested on various benchmarks (GSM8K, SVAMP, CommonsenseQA) using models like UL2, GPT-3, LaMDA, and PaLM, self-consistency outperformed chain-of-thought prompting in improving reasoning accuracy.

  • The approach suggests potential for broader application in reasoning tasks, offering a path to models innately equipped with enhanced reasoning capabilities without additional computational overhead.

Enhancing Reasoning in Language Models through Self-Consistent Decoding

Introduction to Self-Consistency

Recent work in NLP has highlighted the capabilities of chain-of-thought prompting in improving reasoning across a variety of tasks when paired with LLMs. However, this paper introduces an innovative approach named self-consistency, aimed at bolstering reasoning further by exploring the diversity in reasoning paths. Unlike conventional methods that lean on greedy decoding, self-consistency generates multiple reasoning paths from the model's decoder and aggregates them to deduce the most consistent answer. This method, rooted in the belief that multiple reasoning approaches can lead to a correct answer, starkly enhances the model's ability to reason, affirmed by empirical evaluations across several arithmetic and commonsense reasoning benchmarks.

Unveiling Self-Consistency

The uniqueness of self-consistency lies in its generation-across-marginalization strategy. It commences by generating a diverse set of reasoning paths, diverging from traditional greedy decoding routes. Following this generation, it applies an aggregation mechanism to select the answer with the most consensus. This process is akin to ensemble methods but operates within a single model, rendering it a "self-ensemble" approach. This method's simplicity, its unaided operation without additional training or manual annotation, distinguishes it from prior methods that necessitate tailored verifiers or classifiers.

Empirical Insights and Advancements

Self-consistency was evaluated on standard tasks, including GSM8K, SVAMP, and CommonsenseQA, using models like UL2, GPT-3, LaMDA, and PaLM. The results were compelling; self-consistency consistently outperformed chain-of-thought prompting across all tasks and models. Notably, when used with GPT-3 or PaLM, it set new performance benchmarks, with gains as substantial as +17.9\% in absolute accuracy on GSM8K. These findings underscore self-consistency's effectiveness in improving reasoning processes of LLMs across a spectrum of reasoning tasks.

Theoretical and Practical Implications

The exploration of self-consistency augments our understanding of language models' reasoning capabilities and introduces a novel lens to examine how diversity in reasoning can enhance performance. Practically, it offers a streamlined and efficient approach to leverage LLMs for complex reasoning tasks without necessitating additional computation-intensive methods like training classifiers or re-rankers. Moreover, the consistent performance improvements across varying model scales and benchmarks signal its robust applicability in real-world scenarios.

Outlook on Future Directions

While self-consistency marks a significant leap in utilizing LLMs for reasoning, it predominantly benefits tasks with definitive answers. Its extension to more open-ended problems remains an area for future exploration. Additionally, the workings of self-consistency open avenues to study the calibration of LLMs further, especially in gauging their confidence in generated responses. Lastly, the integration of self-consistency with model training processes to yield models innately equipped with enhanced reasoning capabilities presents a compelling direction for research.

Conclusion

In summary, self-consistency introduces a ground-breaking approach to improve reasoning in LLMs. By harnessing the diversity of reasoning paths and the aggregation of consistent answers, it markedly exceeds the performance limits set by existing methods. This approach not only heralds a new era of efficiency and accuracy in reasoning tasks but also illuminates pathways for future investigations into the intricate reasoning abilities of AI systems.

Subscribe by Email

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.

YouTube