Emergent Mind

Perils of Self-Feedback: Self-Bias Amplifies in Large Language Models

(2402.11436)
Published Feb 18, 2024 in cs.CL and cs.AI

Abstract

Recent studies show that self-feedback improves LLMs on certain tasks while worsens other tasks. We discovered that such a contrary is due to LLM's bias towards their own output. In this paper, we formally define LLM's self-bias -- the tendency to favor its own generation -- using two statistics. We analyze six LLMs on translation, constrained text generation, and mathematical reasoning tasks. We find that self-bias is prevalent in all examined LLMs across multiple languages and tasks. Our analysis reveals that while the self-refine pipeline improves the fluency and understandability of model outputs, it further amplifies self-bias. To mitigate such biases, we discover that larger model size and external feedback with accurate assessment can significantly reduce bias in the self-refine pipeline, leading to actual performance improvement in downstream tasks.

Bias, distance skewness in LLaMA-2 models decreases with size across Yor-En translation self-refinement.

Overview

  • The paper investigates self-bias in LLMs across various tasks such as translation, constrained text generation, and mathematical reasoning, uncovering its universal presence and impact on model performance.

  • A novel approach to quantify self-bias is introduced, highlighting an amplification of self-bias across iterations of self-refinement in LLMs, despite improvements in fluency and understandability.

  • The research finds that increasing the model size and integrating external feedback can mitigate self-bias, with larger models showing reduced bias due to their enhanced evaluative capacities.

  • The study calls for further exploration into the dynamics of self-bias across model architectures, tasks, and languages, emphasizing the need for new strategies to ensure the integrity of LLMs.

Unveiling Self-Bias in LLMs Across Diverse Tasks

Introduction to Self-Bias in LLMs

In the evolving landscape of LLMs, the phenomenon of self-bias — where models exhibit a preference for their own generations — presents a nuanced challenge. The study under discussion explore this issue, presenting a comprehensive analysis of self-bias across six diverse LLMs engaged in tasks such as translation, constrained text generation, and mathematical reasoning. This exploration uncovers the universal presence of self-bias, emphasizing its implications on model performance and output quality.

Quantification of Self-Bias

The paper introduces a novel approach to quantify self-bias in LLMs, employing two principal statistics: bias estimation and distance skewness. These metrics illuminate the discrepancy between LLM's self-evaluation and actual performance, revealing a consistent amplification of self-bias across multiple iterations of self-refinement. The findings suggest that, despite improvements in fluency and understandability, self-refinement does not necessarily lead to desired outcomes, such as enhanced quality or broader concept coverage.

Analysis Across Tasks

Translation

Investigations into translation tasks reveal that self-bias not only persists but also intensifies with iterative self-refinement. Notably, open-source LLMs and certain versions of commercially available models display higher self-bias levels. This amplification suggests a misalignment between perceived and actual performance improvements, with models favoring their generative style over substantive quality enhancements.

Constrained Text Generation

For constrained text generation, the study highlights a similar trend of escalating self-bias. The analysis indicates that models may optimize for false positives — improvements that are not genuinely beneficial — leading to a cycle of unproductive optimization and reduced diversity in text generation.

Mathematical Reasoning

In tasks involving mathematical reasoning, the presence of self-bias underscores the challenges LLMs face in self-correction. Despite engaging in iterative refinement, models tend to favor certain reasoning paths, which may not lead to correct solutions, further evidencing the pervasive nature of self-bias across different domains.

Addressing Self-Bias

To mitigate self-bias, the paper proposes two primary interventions: increasing the model size and integrating external feedback. Larger models demonstrate reduced self-bias, possibly due to their enhanced evaluative and corrective capacities. Moreover, external feedback, characterized by accurate assessment, significantly diminishes bias, guiding models towards more accurate self-corrections and genuine performance improvements.

Theoretical and Practical Implications

The research provides a foundational perspective on the mechanisms of self-bias in LLMs, contributing to our understanding of model behaviors in self-refinement and self-rewarding pipelines. Practically, the findings emphasize the need for incorporating mechanisms — such as external feedback and adjusting model sizes — to counterbalance self-bias and enhance the reliability of LLMs across tasks.

Speculating on Future Developments

Looking forward, the paper speculates on the evolution of methodologies to detect, quantify, and mitigate self-bias in LLMs. It calls for further exploration into the dynamics of self-bias across different model architectures, tasks, and languages, underscoring the importance of developing more nuanced and effective strategies to ensure the integrity and applicability of LLMs in diverse real-world scenarios.

Conclusion

The exploration of self-bias in LLMs highlights a critical challenge in the field of AI and machine learning. By systematically analyzing and addressing this issue, the research contributes valuable insights towards the development of more robust, accurate, and unbiased language models, paving the way for advancements that align closely with human evaluative standards and expectations.

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.

YouTube