Self-Alignment for Factuality: Mitigating Hallucinations in LLMs via Self-Evaluation (2402.09267v2)

Published 14 Feb 2024 in cs.CL and cs.AI

Abstract: Despite showing increasingly human-like abilities, LLMs often struggle with factual inaccuracies, i.e. "hallucinations", even when they hold relevant knowledge. To address these hallucinations, current approaches typically necessitate high-quality human factuality annotations. In this work, we explore Self-Alignment for Factuality, where we leverage the self-evaluation capability of an LLM to provide training signals that steer the model towards factuality. Specifically, we incorporate Self-Eval, a self-evaluation component, to prompt an LLM to validate the factuality of its own generated responses solely based on its internal knowledge. Additionally, we design Self-Knowledge Tuning (SK-Tuning) to augment the LLM's self-evaluation ability by improving the model's confidence estimation and calibration. We then utilize these self-annotated responses to fine-tune the model via Direct Preference Optimization algorithm. We show that the proposed self-alignment approach substantially enhances factual accuracy over Llama family models across three key knowledge-intensive tasks on TruthfulQA and BioGEN.

Citations (22)

View on Semantic Scholar

Summary

The paper presents a novel self-alignment framework that leverages self-evaluation and confidence tuning to reduce hallucinations in LLMs.
The methodology integrates SELF-EVAL and SK-TUNING to calibrate model confidence, resulting in improved factual accuracy on tasks like MCQA and open-ended generation.
Experimental results show notable improvements over traditional methods, offering a scalable alternative to human-annotation approaches for enhancing LLM reliability.

Mitigating Hallucinations in LLMs Through Self-Evaluation and Confidence Tuning

Introduction

Hallucination, or the generation of plausible but factually incorrect statements, remains a prominent challenge in the deployment of LLMs across various applications. While LLMs have shown impressive abilities across a broad spectrum of NLP tasks, their tendency to generate hallucinated content undermines trust and restricts their utility, especially in knowledge-intensive tasks. Existing strategies to mitigate hallucinations predominantly rely on high-quality human annotations, posing scalability and generalizability challenges. In addressing these limitations, the paper "Self-Alignment for Factuality: Mitigating Hallucinations in LLMs via Self-Evaluation" introduces a novel approach leveraging the intrinsic self-evaluation capability of LLMs to guide them towards generating more factual content.

Methodology

The methodology revolves around two core components: Self-Evaluation for Factuality (SELF-EVAL) and Self-Knowledge Tuning (SK-TUNING). SELF-EVAL prompts the LLM to assess the factuality of its generated content based on internal knowledge, generating confidence scores concerning the factual accuracy of its responses. To bolster the model's ability to evaluate response factuality effectively, SK-TUNING aims at enhancing the model's confidence estimation and calibration. This process involves creating training examples reflective of factual correctness through dynamic interaction with the model's responses to diverse prompts. The paper outlines a three-step framework incorporating these components, ultimately aligning the model towards improved factuality through Direct Preference Optimization (DPO).

Experimental Setup and Results

The framework's efficacy is quantitatively demonstrated across three knowledge-intensive tasks: Multi-Choice Question Answering (MCQA), short-form, and long-form open-ended generation, employing the TruthfulQA and BioGEN datasets. Notably, SELF-EVAL, combined with SK-TUNING, shows significant enhancements in factual accuracy across all tasks when benchmarked against a suite of decoding-based methods such as ITI and DO LA, and supervised fine-tuning approaches like SFT. The results underscore the potential of leveraging self-evaluation and confidence tuning as a scalable and effective strategy for factuality alignment in LLMs, without the necessity for domain-specific annotations.

Implications and Future Directions

This research underscores the potential of self-evaluation capabilities in LLMs as a mechanism to internally assess and correct factual inaccuracies. The introduced self-alignment framework paves the way for developing more trustworthy and reliable LLMs by directly engaging with the models' inherent knowledge. The success of SELF-EVAL and SK-TUNING in enhancing factuality opens up new avenues for future research in model trustworthiness, particularly in extending these methods to larger models and exploring their integration with other factual alignment techniques. Moreover, the scalability of these approaches invites further investigation into their applicability across diverse languages and domains, potentially contributing to the broader objective of achieving generalizable and reliable machine understanding.

Conclusion

The paper presents a compelling approach to mitigating hallucinations in LLMs through self-evaluation and confidence tuning. By leveraging the models' inherent capabilities for self-assessment and dynamically tuning their confidence in generated responses, the framework demonstrates notable improvements in factual accuracy across several tasks. This strategy represents a significant step forward in enhancing the reliability and trustworthiness of LLM outputs, offering a scalable alternative to annotation-centric methods for factuality alignment.

PDF Markdown

Related Papers

Tweets

https://twitter.com/arankomatsuzaki/status/1757952086317408519

https://twitter.com/mctalentowen/status/1800904662310322484