Emergent Mind

Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?

(2405.05904)
Published May 9, 2024 in cs.CL

Abstract

When LLMs are aligned via supervised fine-tuning, they may encounter new factual information that was not acquired through pre-training. It is often conjectured that this can teach the model the behavior of hallucinating factually incorrect responses, as the model is trained to generate facts that are not grounded in its pre-existing knowledge. In this work, we study the impact of such exposure to new knowledge on the capability of the fine-tuned model to utilize its pre-existing knowledge. To this end, we design a controlled setup, focused on closed-book QA, where we vary the proportion of the fine-tuning examples that introduce new knowledge. We demonstrate that LLMs struggle to acquire new factual knowledge through fine-tuning, as fine-tuning examples that introduce new knowledge are learned significantly slower than those consistent with the model's knowledge. However, we also find that as the examples with new knowledge are eventually learned, they linearly increase the model's tendency to hallucinate. Taken together, our results highlight the risk in introducing new factual knowledge through fine-tuning, and support the view that LLMs mostly acquire factual knowledge through pre-training, whereas fine-tuning teaches them to use it more efficiently.

Accuracies over time during fine-tuning; Unknown examples fit slower and affect performance negatively if overfit.

Overview

  • The study investigates how introducing new factual knowledge during the fine-tuning of LLMs affects their ability to utilize pre-existing knowledge and their propensity to generate factually incorrect outputs (hallucinations).

  • Researchers employed a controlled setup focusing on a closed-book question-answering task, categorizing fine-tuning examples into Known and Unknown types to analyze the impact on model performance and hallucinations.

  • Key findings indicate that new knowledge slows down learning and increases the risk of hallucinations. Practical recommendations include controlling the introduction of new knowledge, using early-stopping techniques, and filtering out Unknown examples to maintain model reliability.

Investigating the Impact of New Knowledge Introduction on LLMs

This essay provides an in-depth look at the academic paper focused on the consequences of introducing new factual knowledge to LLMs via supervised fine-tuning. The paper aims to assess how such fine-tuning affects a model's ability to utilize its pre-existing knowledge and its propensity to hallucinate inaccurate responses.

Introduction

Pre-training LLMs on vast textual corpora embeds a considerable amount of factual knowledge in their parameters. This knowledge provides a foundation for various downstream applications. However, LLMs often require further alignment through supervised fine-tuning on instruction-following tasks and preference learning from human feedback. This process can introduce new factual information that deviates from the knowledge acquired during pre-training. A prevailing conjecture in the field postulates that exposure to new knowledge during fine-tuning could promote hallucinations, where models generate factually incorrect outputs.

Study Setup and Methodology

To analyze the impact of new knowledge in fine-tuning, the authors designed a controlled setup focused on a closed-book question-answering (QA) task. They categorized the fine-tuning examples into Known and Unknown types, with Known examples further divided into ClearlyKnown, MaybeKnown, and WeaklyKnown categories. The study evaluates how the proportion of Unknown examples in the fine-tuning dataset affects the model's performance and tendency to hallucinate.

Key Findings

  1. Learning Dynamics: The study finds that Unknown examples are learned substantially slower than Known examples during fine-tuning. This suggests that LLMs struggle to integrate new factual knowledge through fine-tuning and primarily enhance their ability to utilize pre-existing knowledge.
  2. Hallucinations: There is a linear correlation between the proportion of Unknown examples learned and the model's tendency to hallucinate. This highlights the risk of introducing new factual knowledge through fine-tuning, which can compromise the model’s reliability by increasing hallucinations.
  3. Overfitting and Early-Stopping: As Unknown examples are primarily learned in the later stages of training, their presence increases the risk of overfitting. The study demonstrates that early-stopping can mitigate this issue, improving development performance by preventing the fitting of most Unknown examples.
  4. Filtering Unknown Examples: Removing Unknown examples from the fine-tuning dataset significantly reduces the risk of overfitting without sacrificing performance. This indicates that aligning the fine-tuning data with the model's pre-existing knowledge is crucial for optimal performance.
  5. Performance Across Categories: Fine-tuning on ClearlyKnown examples alone does not yield the best results. Incorporating MaybeKnown examples, which represent facts with lower certainty, is essential for handling such examples during inference, thereby improving the model's performance.

Implications for Practice and Theory

The study's findings have several practical implications. Fine-tuning with a high proportion of Unknown examples can degrade model performance and increase hallucinations. Thus, it is advisable to control the introduction of new factual knowledge during fine-tuning. Techniques such as early-stopping and filtering-out Unknown examples can be effective in maintaining model reliability.

From a theoretical perspective, the findings support the hypothesis that LLMs mostly acquire factual knowledge through pre-training, while fine-tuning predominantly teaches models to use this knowledge more efficiently. This underscores the limited efficacy of supervised fine-tuning as a means to integrate new factual knowledge, suggesting a need for alternative methods or refined fine-tuning approaches.

Future Directions

Future research could explore various avenues to address these issues:

  • Developing robust methods for filtering or appropriately labeling new factual information encountered during fine-tuning.
  • Investigating the long-term effects of new knowledge introduction in broader and more diverse dataset contexts.
  • Exploring alternative fine-tuning strategies that can enhance the integration of new knowledge without promoting hallucinations.

Conclusion

The paper provides significant insights into the dynamics of knowledge acquisition in LLMs and the consequences of introducing new factual information through fine-tuning. The results demonstrate that while LLMs enhance their utilization of pre-existing knowledge through fine-tuning, they struggle to integrate new knowledge, which leads to increased hallucinations. Practitioners should consider these findings when designing fine-tuning processes to avoid adverse effects and maximize the efficiency of LLMs.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.

YouTube