When Can LLMs Actually Correct Their Own Mistakes? A Critical Survey of Self-Correction of LLMs (2406.01297v3)

Published 3 Jun 2024 in cs.CL

Abstract: Self-correction is an approach to improving responses from LLMs by refining the responses using LLMs during inference. Prior work has proposed various self-correction frameworks using different sources of feedback, including self-evaluation and external feedback. However, there is still no consensus on the question of when LLMs can correct their own mistakes, as recent studies also report negative results. In this work, we critically survey broad papers and discuss the conditions required for successful self-correction. We first find that prior studies often do not define their research questions in detail and involve impractical frameworks or unfair evaluations that over-evaluate self-correction. To tackle these issues, we categorize research questions in self-correction research and provide a checklist for designing appropriate experiments. Our critical survey based on the newly categorized research questions shows that (1) no prior work demonstrates successful self-correction with feedback from prompted LLMs, except for studies in tasks that are exceptionally suited for self-correction, (2) self-correction works well in tasks that can use reliable external feedback, and (3) large-scale fine-tuning enables self-correction.

Citations (20)

View on Semantic Scholar

Summary

The paper categorizes research questions in self-correction into intrinsic methods and external feedback approaches.
It critically analyzes prior studies, revealing evaluation biases and impractical experimental designs.
The paper provides a comprehensive checklist for experimental design to guide improved self-correcting frameworks.

When Can LLMs Actually Correct Their Own Mistakes? A Critical Survey of Self-Correction of LLMs

The paper "When Can LLMs Actually Correct Their Own Mistakes? A Critical Survey of Self-Correction of LLMs" by Ryo Kamoi et al. addresses the ongoing debate within the research community regarding the self-correction capabilities of LLMs. Given the proliferation of self-correction frameworks, this survey critically examines the conditions under which such frameworks succeed or fail, providing an essential resource for researchers pursuing advancements in this area.

Key Contributions

The primary contributions of the paper can be summarized as follows:

Categorization of Research Questions: The paper categorizes research questions in self-correction into three main areas:
- RQ Group A: Can LLMs self-correct their best-possible initial responses with or without external feedback?
- RQ Group B: Are the final outputs from self-correction frameworks superior to those generated by other methods?
Analysis of Prior Work: The authors analyze prior studies and identify key deficiencies, such as unclear definitions of research questions and the use of impractical or unfair evaluation frameworks.
Checklist for Experimental Design: A comprehensive checklist for designing experiments aimed at verifying specific research questions in self-correction research. This checklist includes important elements like not using oracle information and ensuring strong initial prompts.

Frameworks of Self-Correction

The paper discusses various frameworks of self-correction under different conditions:

Intrinsic Self-Correction:
- Works where LLMs prompt themselves to generate feedback and refine responses are categorized under intrinsic self-correction. The survey finds that, in general tasks, intrinsic self-correction frameworks rarely demonstrate successful self-correction due to intrinsic difficulties in generating reliable feedback.
- It highlights that tasks with decomposable or verifiable responses (e.g., arithmetic reasoning) may be more amenable to intrinsic self-correction.
Self-Correction with External Feedback:
- This category includes frameworks where LLMs use external tools or knowledge sources to generate feedback on their initial responses. Examples include using code interpreters for code generation tasks and search engines to verify QA responses.
- The survey points out that frameworks failing to use external information equitably for both initial response generation and feedback generation can overstate the effectiveness of self-correction.
Self-Correction with Fine-Tuning:
- The paper reviews approaches involving supervised fine-tuning and reinforcement learning to generate feedback and refine responses. Although effective, these methods often require large-scale datasets for training, which can be impractical for many applications.
- The paper suggests that future research should seek to reduce the dependency on large datasets.

Implications and Future Directions

The theoretical and practical implications of this research are significant:

Bottlenecks in Feedback Generation: Given that generating reliable feedback is identified as a key bottleneck in intrinsic self-correction, future research should focus on improving feedback mechanisms, perhaps by leveraging more sophisticated LLM-based evaluation metrics and confidence estimation techniques.
Tasks Suitable for Self-Correction: Researchers should target tasks that inherently support self-correction frameworks, such as those with easily verifiable or decomposable responses.
Pre-training and Fine-Tuning Strategies: The paper calls for innovative pre-training strategies that might inherently endow LLMs with better self-correction capabilities. Moreover, reducing the need for extensive fine-tuning datasets can make self-correction frameworks more practical.

Conclusion

This critical survey brings much-needed clarity to the question of when and how LLMs can correct their own mistakes. By systematically categorizing research questions and scrutinizing prior work, the authors not only identify the current limitations but also pave the way for future advancements in the field. The detailed checklist provided for designing experiments ensures that future research can build on a solid and well-defined foundation, potentially accelerating the development of more effective self-correction frameworks.

PDF Markdown

Related Papers

Tweets

https://twitter.com/RyoKamoi/status/1835418291516670353

https://twitter.com/RyoKamoi/status/1798369441509777847

https://twitter.com/RyoKamoi/status/1862608474015768972

https://twitter.com/RyoKamoi/status/1837841276374548798

https://twitter.com/RyoKamoi/status/1826272200636469511

https://twitter.com/fly51fly/status/1799446858567479357

YouTube

Show All Videos