Emergent Mind

Abstract

LLMs can generate intermediate reasoning steps. To elicit the reliable reasoning, the common practice is to employ few-shot chain-of-thought prompting, where several in-context demonstrations for reasoning are prepended to the question. However, such chain-of-thought examples are expensive to craft, especially for professional domains, and can have high variance depending on human annotators. Therefore, this work investigates whether LLMs can teach themselves to reason without human-crafted demonstrations. We propose SELF-EXPLAIN to generate CoT examples by LLMs inspired by "encoding specificity" in human memory retrieval. We find using self-explanations makes LLMs more confident, more calibrated and less biased when answering complex questions. Moreover, we find prompting with self-explanations can even significantly outperform using human-crafted CoTs on several complex question answering dataset.

Framework of SELF-EXPLAIN: generating self-explanations for in-context exemplar use during testing.

Overview

  • The SELF-EXPLAIN framework allows LLMs to generate their own Chain-of-Thought (CoT) examples, enhancing reasoning without human-crafted inputs.

  • Experiments on datasets like MedMCQA and StrategyQA show that SELF-EXPLAIN outperforms human-crafted CoTs, improving test accuracy and model calibration.

  • The method has significant implications for high-stakes domains like healthcare, providing more reliable and accessible AI-driven decision support while reducing biases.

Teaching LLMs to Reason with SELF-EXPLAIN

The paper "SELF-EXPLAIN: Teaching LLMs to Reason Complex Questions by Themselves" by Jiachen Zhao, Zonghai Yao, Zhichao Yang, and Hong Yu makes significant strides in the domain of prompting LLMs like GPT-3.5 to generate intermediate reasoning steps without the need for human-crafted demonstrations. This goal addresses the challenges and limitations associated with the creation and application of human-crafted Chain-of-Thought (CoT) exemplars, which are traditionally employed to enhance the reasoning capabilities of LLMs.

Introduction

The study begins by recognizing that while LLMs have demonstrated substantial capabilities in learning patterns from in-context exemplars, known as in-context learning (ICL), the deployment of intermediate reasoning steps via CoT prompting often yields higher performance. However, designing CoT examples is labor-intensive, particularly in professional domains such as medicine, where domain-specific expertise is required. Moreover, the variance in human annotations can lead to inconsistencies in the generated CoTs.

SELF-EXPLAIN Framework

The proposed method, SELF-EXPLAIN, leverages LLMs to generate their own CoT examples inspired by the concept of encoding specificity in human memory retrieval. SELF-EXPLAIN enables LLMs to produce explanations that make them more confident, calibrated, and less biased when handling complex questions. The generated self-explanations serve as in-context CoTs and have demonstrated performance that even surpasses human-crafted CoTs in various knowledge-intensive domains.

The SELF-EXPLAIN framework operates in three primary stages:

  1. Generation of Self-Explanations: LLMs generate CoTs for training data given a question and its answer based on encoded knowledge.
  2. In-Context Learning: Use these self-generated CoTs as exemplars for ICL during testing.
  3. Performance Comparison: Evaluate the efficacy of self-explanations against human-crafted CoTs.

Experimental Setup and Results

The authors perform rigorous experiments on datasets that demand intricate reasoning, such as MedMCQA, MedQA, and StrategyQA. These datasets include multiple-choice questions that require deep domain knowledge and logical reasoning.

The results demonstrated in Table 1 reveal that using associated CoT prompting significantly enhances performance across the datasets. Notably, SELF-EXPLAIN achieves higher accuracy than both zero-shot CoT and Auto-CoT methodologies and even surpasses human-crafted CoTs. For instance, the test accuracy on MedMCQA improved to 56.6%, compared to 53.1% with human-crafted CoTs, highlighting the potential of self-explanation.

Calibration and Bias

Another key finding is that LLMs exhibit higher confidence and are better calibrated when prompted with self-explanations. Figures 3 and 4 in the study illustrate that self-explanations reduce the intrinsic biases observed when human-crafted CoTs are used. This calibration and reduced bias could be critical in real-world applications where user trust and reliability are vital.

Implications

Practical Implications:

  • Healthcare and Specialized Knowledge Domains: In professional domains where expertise is scarce and expensive, SELF-EXPLAIN can significantly lower costs and improve access to high-quality AI guidance in decision-making processes.
  • Model Confidence and Bias: Implementing self-explanation prompts can lead to more reliable AI systems with better-calibrated outputs, which is crucial for user trust in high-stakes environments.

Theoretical Implications:

  • Encoding Specificity Hypothesis: The success of SELF-EXPLAIN supports the encoding specificity hypothesis, suggesting that LLMs benefit from context during pre-training that closely aligns with test-time requirements.
  • Generalization: The work challenges the entrenched belief that human-crafted CoTs are superior, proposing that machine-generated CoTs, driven by properly framed prompts, can achieve or even surpass human-generated reasoning in certain contexts.

Future Directions

Future research could expand on several fronts:

  1. Diverse Domains and Models: Extending the SELF-EXPLAIN method to different domains and testing its efficacy on other LLMs could validate its robustness.
  2. Optimization of Prompt Generation: Refining the prompt generation process to universally accommodate variances in input-output relationships.
  3. User Interaction: Investigating user interactions with LLMs using self-explanations to further understand trust and decision reliance on machine-generated insights.

In conclusion, "SELF-EXPLAIN" offers a novel approach that enables LLMs to autonomously generate intermediate reasoning steps, demonstrating not only superior performance to human-crafted CoTs but also potentially shifting the paradigm in how machine intelligence can learn and convey complex information. This paves the way for more accessible, reliable, and cost-effective AI applications across a myriad of domains.

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.