LLMs cannot find reasoning errors, but can correct them given the error location (2311.08516v3)
Abstract: While self-correction has shown promise in improving LLM outputs in terms of style and quality (e.g. Chen et al., 2023b; Madaan et al., 2023), recent attempts to self-correct logical or reasoning errors often cause correct answers to become incorrect, resulting in worse performances overall (Huang et al., 2023). In this paper, we show that poor self-correction performance stems from LLMs' inability to find logical mistakes, rather than their ability to correct a known mistake. Firstly, we benchmark several state-of-the-art LLMs on their mistake-finding ability and demonstrate that they generally struggle with the task, even in highly objective, unambiguous cases. Secondly, we test the correction abilities of LLMs -- separately from mistake finding -- using a backtracking setup that feeds ground truth mistake location information to the model. We show that this boosts downstream task performance across our 5 reasoning tasks, indicating that LLMs' correction abilities are robust. Finally, we show that it is possible to obtain mistake location information without ground truth labels or in-domain training data. We train a small classifier with out-of-domain data, which exhibits stronger mistake-finding performance than prompting a large model. We release our dataset of LLM-generated logical mistakes, BIG-Bench Mistake, to enable further research into locating LLM reasoning mistakes.
- Palm 2 technical report. arXiv preprint arXiv:2305.10403.
- Iterative translation refinement with large language models. arXiv preprint arXiv:2306.03856.
- Andrew F Hayes and Klaus Krippendorff. 2007. Answering the call for a standard reliability measure for coding data. Communication methods and measures, 1(1):77–89.
- Large language models can self-improve. arXiv preprint arXiv:2210.11610.
- Large language models cannot self-correct reasoning yet. arXiv preprint arXiv:2310.01798.
- Language models can solve computer tasks. arXiv preprint arXiv:2303.17491.
- Let’s verify step by step. arXiv preprint arXiv:2305.20050.
- Self-refine: Iterative refinement with self-feedback. arXiv preprint arXiv:2303.17651.
- Selfcheck: Using llms to zero-shot check their own step-by-step reasoning. arXiv preprint arXiv:2308.00436.
- Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744.
- Automatically correcting large language models: Surveying the landscape of diverse self-correction strategies. arXiv preprint arXiv:2308.03188.
- Self-critiquing models for assisting human evaluators. arXiv preprint arXiv:2206.05802.
- Reflexion: Language agents with verbal reinforcement learning.
- Beyond the imitation game: Quantifying and extrapolating the capabilities of language models. arXiv preprint arXiv:2206.04615.
- Beyond the imitation game: Quantifying and extrapolating the capabilities of language models. Transactions on Machine Learning Research.
- Big-bench-hard/cot-prompts. https://github.com/suzgunmirac/BIG-Bench-Hard/tree/main/cot-prompts. Accessed: 2023-10-31.
- Self-consistency improves chain of thought reasoning in language models. In ICLR 2023.
- Chain-of-thought prompting elicits reasoning in large language models. In Advances in Neural Information Processing Systems, volume 35, pages 24824–24837. Curran Associates, Inc.
- React: Synergizing reasoning and acting in language models. arXiv preprint arXiv:2210.03629.