Emergent Mind

Abstract

LLMs are expected to respond accurately but often exhibit deficient reasoning or generate hallucinatory content. To address these, studies prefixed with Self-'' such as Self-Consistency, Self-Improve, and Self-Refine have been initiated. They share a commonality: involving LLMs evaluating and updating itself to mitigate the issues. Nonetheless, these efforts lack a unified perspective on summarization, as existing surveys predominantly focus on categorization without examining the motivations behind these works. In this paper, we summarize a theoretical framework, termed Internal Consistency, which offers unified explanations for phenomena such as the lack of reasoning and the presence of hallucinations. Internal Consistency assesses the coherence among LLMs' latent layer, decoding layer, and response layer based on sampling methodologies. Expanding upon the Internal Consistency framework, we introduce a streamlined yet effective theoretical framework capable of mining Internal Consistency, named Self-Feedback. The Self-Feedback framework consists of two modules: Self-Evaluation and Self-Update. This framework has been employed in numerous studies. We systematically classify these studies by tasks and lines of work; summarize relevant evaluation methods and benchmarks; and delve into the concern,Does Self-Feedback Really Work?'' We propose several critical viewpoints, including the Hourglass Evolution of Internal Consistency'',Consistency Is (Almost) Correctness'' hypothesis, and ``The Paradox of Latent and Explicit Reasoning''. Furthermore, we outline promising directions for future research. We have open-sourced the experimental code, reference list, and statistical data, available at \url{https://github.com/IAAR-Shanghai/ICSFSurvey}.

Hourglass model depicting the evolution of internal consistency.

Overview

  • The paper by Liang et al. surveys the issues of reasoning and hallucination in LLMs and introduces the concept of Internal Consistency as a theoretical framework to address these challenges.

  • The survey presents a Self-Feedback framework to enhance LLM capabilities through iterative self-improvement, involving self-evaluation and self-update mechanisms.

  • Future directions and implications include improving LLMs' textual self-awareness, balancing reasoning efficiency, and the need for robust evaluation metrics to holistically assess improvements.

Internal Consistency and Self-Feedback in LLMs: A Survey

LLMs have demonstrated extensive capabilities across various natural language tasks, yet they frequently exhibit issues related to reasoning and hallucination. Such deficiencies highlight the complex challenge of maintaining internal consistency within LLMs. In their exhaustive survey, Liang et al. propose a theoretical framework, Internal Consistency, to succinctly address these challenges and introduce the concept of Self-Feedback as a strategy to enhance model capabilities through iterative self-improvement.

Internal Consistency and its Evaluation

Internal Consistency is defined as the measure of coherence among the LLM’s latent, decoding, and response layers. The proposed framework hinges on improving consistency at all three levels:

  1. Response Consistency: Ensuring uniformity in responses across similar queries.
  2. Decoding Consistency: Stability in token selection during the decoding process.
  3. Latent Consistency: Reliability of internal states and attention mechanisms.

The authors reveal the "Hourglass Evolution of Internal Consistency," illustrating how an LLM's consistency varies across different layers. Latent states near the bottom layers exhibit randomness in responses, gradually improving through intermediate and final layers, but diverging again at the response generation stage. This phenomenon underscores the significant role internal mechanisms play in consistency and the challenges at each stage of processing within LLMs.

Self-Feedback Framework

The Self-Feedback framework proposed by Liang et al. involves two core modules: Self-Evaluation and Self-Update. The LLM initially performs Self-Evaluation by analyzing its outputs, and then uses the generated feedback to refine its responses or update its internal parameters. This self-improving feedback loop is fundamental to addressing issues of reasoning and hallucination.

Consistency Signal Acquisition

A crucial part of the Self-Feedback framework is the acquisition of consistency signals. The authors categorize these methods into six primary lines of work:

  1. Uncertainty Estimation: Estimating the model's uncertainty in its outputs to guide refinement.
  2. Confidence Estimation: Quantifying the model's confidence in its responses.
  3. Hallucination Detection: Identifying and mitigating unfaithful or incorrect content generated by the model.
  4. Verbal Critiquing: Allowing the LLM to generate critiques of its own outputs to facilitate iterative improvement.
  5. Contrastive Optimization: Optimizing the model by comparing different outputs and selecting the best.
  6. External Feedback: Leveraging external tools or more robust models to provide feedback on generated content.

Each method plays a distinct role in refining model outputs and enhancing internal consistency.

Critical Viewpoints and Future Directions

The survey introduces several critical viewpoints such as the "Consistency Is (Almost) Correctness" hypothesis, which posits that increasing a model's internal consistency generally results in improved overall correctness. This is predicated on the assumption that pre-training corpora are predominantly aligned with correct world knowledge.

Despite advancements, several challenges and future directions emerge from the survey:

  1. Textual Self-Awareness: Improving LLMs' ability to express their degrees of certainty and uncertainty in textual form.
  2. The Reasoning Paradox: Balancing latent and explicit reasoning to optimize reasoning efficiency without disrupting inference.
  3. Deeper Investigation: Moving beyond response-level improvements to explore decoding and latent states comprehensively.
  4. Unified Perspective: Integrating improvements across response, decoding, and latent layers to form a cohesive improvement strategy.
  5. Comprehensive Evaluation: Establishing robust evaluation metrics and benchmarks to holistically assess LLM capabilities and improvements.

Implications and Conclusions

The implications of refining LLMs through Internal Consistency Mining are manifold. Enhanced consistency not only improves the reliability of LLMs in various applications but also strengthens their alignment with human-like reasoning capabilities. The Self-Feedback framework represents a significant forward step in the iterative improvement of LLMs by leveraging their own feedback mechanisms.

In conclusion, this survey by Liang et al. offers a comprehensive and structured approach to addressing the core issues of reasoning and hallucination in LLMs through the lens of internal consistency. By systematically categorizing and evaluating various methods, the paper lays a solid foundation for ongoing and future research in enhancing the robustness and reliability of LLMs.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.

YouTube