RCOT: Detecting and Rectifying Factual Inconsistency in Reasoning by Reversing Chain-of-Thought (2305.11499v2)

Published 19 May 2023 in cs.CL

Abstract: LLMs have achieved promising performance on arithmetic reasoning tasks by incorporating step-by-step chain-of-thought (CoT) prompting. However, LLMs face challenges in maintaining factual consistency during reasoning, exhibiting tendencies to condition overlooking, question misinterpretation, and condition hallucination over given problems. Existing methods use coarse-grained feedback (e.g., whether the answer is correct) to improve factual consistency. In this work, we propose RCoT (Reversing Chain-of-Thought), a novel method to improve LLMs' reasoning abilities by automatically detecting and rectifying factual inconsistency in LLMs, generated solutions. To detect factual inconsistency, RCoT first asks LLMs to reconstruct the problem based on generated solutions. Then fine-grained comparisons between the original problem and the reconstructed problem expose the factual inconsistency in the original solutions. To rectify the solution, RCoT formulates detected factual inconsistency into fine-grained feedback to guide LLMs in revising solutions. Experimental results demonstrate improvements of RCoT over standard CoT, Self-Consistency and Self-Refine across seven arithmetic datasets. Moreover, we find that manually written fine-grained feedback can dramatically improve LLMs' reasoning abilities (e.g., ChatGPT reaches 94.6% accuracy on GSM8K), encouraging the community to further explore the fine-grained feedback generation methods.

Citations (26)

View on Semantic Scholar

Summary

The paper introduces RCoT, a method that reconstructs reasoning chains to detect and rectify factual errors in LLM outputs.
It employs fine-grained feedback derived from discrepancies between original and reconstructed problems to guide error rectification.
Experiments on arithmetic datasets show RCoT outperforms prior methods, achieving up to 94.6% accuracy on GSM8k with human-crafted feedback.

The paper "RCoT: Detecting and Rectifying Factual Inconsistency in Reasoning by Reversing Chain-of-Thought" introduces a novel methodology aimed at improving the reasoning abilities of LLMs, particularly in arithmetic tasks. Despite the potential of LLMs and techniques like Chain-of-Thought (CoT) prompting, factual consistency remains a significant challenge, as models can conditionally overlook, hallucinate, or misinterpret questions and conditions during iterative reasoning.

Key Contributions and Methodology:

RCoT Framework: The authors propose the Reversing Chain-of-Thought (RCoT) method, which improves factual consistency by enabling LLMs to detect and rectify errors in their generated reasoning chains. RCoT reconstructs the original problem from the solution generated by the LLM. Differences between the original and reconstructed problems highlight factual inconsistencies such as hallucinations, overlookings, and misinterpretations. Fine-grained feedback, derived from these discrepancies, guides LLMs to correct their reasoning processes.
Problem Reconstruction: In RCoT, an LLM is first prompted to reconstruct the problem based on the rationale it produced initially. This serves to assess the internal consistency and coherence of the reasoning chain.
Fine-Grained Comparison: The method conducts an in-depth comparison between conditions and conclusions in the original and reconstructed problems, identifying specific instances of factual inconsistency.
Rectification Process: Detected factual inconsistencies are articulated into explicit feedback that guides the LLM to revise its reasoning approach. This process not only improves the solution's accuracy but also enhances interpretability by explicitly identifying reasoning errors.
Experimental Validation: The authors performed comprehensive experiments across seven arithmetic datasets, including GSM8k, AQuA, SVAMP, and others. The RCoT method demonstrated improved performance over standard CoT and other strategies like Self-Consistency and Self-Refine, indicating the method's efficacy in mitigating factual inconsistencies. Notably, RCoT facilitates dramatic improvements when fine-grained, human-crafted feedback is incorporated; for example, ChatGPT achieved a 94.6% accuracy on the GSM8K dataset with such feedback.
Comparison to Baselines: RCoT showed superior performance and efficiency compared to methods like Self-Consistency, which involves multiple solution trials, highlighting RCoT's capacity for improving solutions at a reduced computational cost.

Overall, the RCoT approach provides a structured methodology to enhance the factual reliability of reasoning tasks in LLMs, emphasizing the role of fine-grained feedback in error rectification. The findings encourage further exploration into automated fine-grained feedback generation for improving complex reasoning tasks in natural language processing. Future work may extend this method to other forms of reasoning tasks and seek to reduce inference times.

PDF Markdown

Related Papers

Tweets

https://twitter.com/alex_prompter/status/1915709731076800836