Automatic Chain of Thought Prompting in Large Language Models (2210.03493v1)

Published 7 Oct 2022 in cs.CL and cs.AI

Abstract: LLMs can perform complex reasoning by generating intermediate reasoning steps. Providing these steps for prompting demonstrations is called chain-of-thought (CoT) prompting. CoT prompting has two major paradigms. One leverages a simple prompt like "Let's think step by step" to facilitate step-by-step thinking before answering a question. The other uses a few manual demonstrations one by one, each composed of a question and a reasoning chain that leads to an answer. The superior performance of the second paradigm hinges on the hand-crafting of task-specific demonstrations one by one. We show that such manual efforts may be eliminated by leveraging LLMs with the "Let's think step by step" prompt to generate reasoning chains for demonstrations one by one, i.e., let's think not just step by step, but also one by one. However, these generated chains often come with mistakes. To mitigate the effect of such mistakes, we find that diversity matters for automatically constructing demonstrations. We propose an automatic CoT prompting method: Auto-CoT. It samples questions with diversity and generates reasoning chains to construct demonstrations. On ten public benchmark reasoning tasks with GPT-3, Auto-CoT consistently matches or exceeds the performance of the CoT paradigm that requires manual designs of demonstrations. Code is available at https://github.com/amazon-research/auto-cot

Citations (460)

View on Semantic Scholar

Summary

The paper introduces Auto-CoT, an automated method that constructs diverse chain-of-thought demonstrations, reducing manual effort in LLM reasoning tasks.
It leverages semantic clustering to select representative questions, blending the strengths of zero-shot and manual prompting for improved accuracy.
Experimental results on ten benchmarks show that Auto-CoT consistently matches or outperforms manual methods across various reasoning tasks.

Automatic Chain of Thought Prompting in LLMs

Introduction

The emergence of chain-of-thought (CoT) prompting as an effective strategy to enhance the reasoning abilities of LLMs marks a significant advancement in the field of natural language understanding and reasoning. CoT prompting facilitates the decomposition of complex questions into intermediate reasoning steps that ultimately lead to an answer. This paper presents an innovative approach, termed Auto-CoT, aimed at automating the construction of demonstrations for CoT prompting, thereby eliminating the necessity for manually designed demonstrations, a process that is both labor-intensive and less adaptable to diverse reasoning tasks.

Chain-of-Thought Prompting Paradigms

CoT prompting can be broadly categorized into two paradigms: Zero-Shot-CoT and Manual-CoT. The Zero-Shot-CoT approach leverages a single, general prompt to elicit reasoning chains from an LLM without the need for task-specific input-output demonstrations. Although simple and task-agnostic, this paradigm occasionally falls short due to its reliance on the innate reasoning capabilities of the LLM. On the other hand, the Manual-CoT paradigm, which manually crafts demonstrations for each reasoning task, demonstrates superior performance by effectively scaffolding the reasoning process of the LLM. While Manual-CoT offers improved accuracy, the approach is not scalable due to the significant manual effort required to design demonstrations for different reasoning tasks.

The Proposal of Auto-CoT

In response to the limitations of existing approaches, this paper introduces Auto-CoT, an automatic CoT prompting method that leverages the strengths of both paradigms while addressing their shortcomings. The core insight of Auto-CoT is the observation that diversity in demonstration questions is crucial for mitigating the impact of errors in LLM-generated reasoning chains. Auto-CoT operationalizes this insight through two primary steps: clustering questions based on semantic similarity and then selecting representative questions from each cluster to construct diverse demonstrations. This strategy not only enhances the reasoning performance of LLMs but also offers a scalable and flexible solution adaptable to various reasoning tasks without the need for manual demonstration crafting.

Experimental Evaluation

Auto-CoT was evaluated across ten benchmark reasoning tasks, spanning arithmetic reasoning, commonsense reasoning, and symbolic reasoning. The experiments demonstrated that Auto-CoT consistently matches or outperforms the Manual-CoT paradigm in terms of reasoning accuracy. This finding is especially noteworthy given that Auto-CoT requires no manual effort in designing task-specific demonstrations, representing a significant efficiency improvement over existing methods.

Implications and Future Directions

The development of Auto-CoT signifies a promising direction for leveraging LLMs in complex reasoning tasks without necessitating exhaustive manual efforts. By automating the demonstration construction process and highlighting the importance of diversity in demonstrations, Auto-CoT presents an adaptable approach that could be extended to a wider range of reasoning tasks beyond those examined in this paper. Future research could explore more sophisticated clustering and selection algorithms to further refine the quality of automatically constructed demonstrations and investigate the application of Auto-CoT in real-world reasoning and decision-making scenarios.

In conclusion, Auto-CoT advances the state-of-the-art in CoT prompting by automating the construction of demonstrations, thereby reducing manual labor and enhancing the scalability of deploying LLMs for complex reasoning tasks. This work not only contributes to our understanding of effective prompting strategies for LLMs but also paves the way for broader applications of LLMs in reasoning-intensive domains.

PDF Markdown

Related Papers

GitHub

GitHub - amazon-science/auto-cot: Official implementation for "Automatic Chain of Thought Prompting in Large Language Models" (stay tuned & more will be updated) (1,250 stars)

Tweets

https://twitter.com/alex_prompter/status/1914964170291761600

https://twitter.com/urataps/status/1873385663619617069

YouTube

Show All Videos