Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations? (2405.05904v3)

Published 9 May 2024 in cs.CL

Abstract: When LLMs are aligned via supervised fine-tuning, they may encounter new factual information that was not acquired through pre-training. It is often conjectured that this can teach the model the behavior of hallucinating factually incorrect responses, as the model is trained to generate facts that are not grounded in its pre-existing knowledge. In this work, we study the impact of such exposure to new knowledge on the capability of the fine-tuned model to utilize its pre-existing knowledge. To this end, we design a controlled setup, focused on closed-book QA, where we vary the proportion of the fine-tuning examples that introduce new knowledge. We demonstrate that LLMs struggle to acquire new factual knowledge through fine-tuning, as fine-tuning examples that introduce new knowledge are learned significantly slower than those consistent with the model's knowledge. However, we also find that as the examples with new knowledge are eventually learned, they linearly increase the model's tendency to hallucinate. Taken together, our results highlight the risk in introducing new factual knowledge through fine-tuning, and support the view that LLMs mostly acquire factual knowledge through pre-training, whereas fine-tuning teaches them to use it more efficiently.

Citations (65)

View on Semantic Scholar

Summary

The paper shows that LLMs struggle to integrate Unknown factual knowledge during fine-tuning, increasing the risk of hallucinations.
The study finds a linear relationship between the proportion of Unknown examples and a rise in hallucination rates.
Early-stopping and filtering Unknown examples are effective strategies to reduce overfitting and maintain model reliability.

Investigating the Impact of New Knowledge Introduction on LLMs

This essay provides an in-depth look at the academic paper focused on the consequences of introducing new factual knowledge to LLMs via supervised fine-tuning. The paper aims to assess how such fine-tuning affects a model's ability to utilize its pre-existing knowledge and its propensity to hallucinate inaccurate responses.

Introduction

Pre-training LLMs on vast textual corpora embeds a considerable amount of factual knowledge in their parameters. This knowledge provides a foundation for various downstream applications. However, LLMs often require further alignment through supervised fine-tuning on instruction-following tasks and preference learning from human feedback. This process can introduce new factual information that deviates from the knowledge acquired during pre-training. A prevailing conjecture in the field postulates that exposure to new knowledge during fine-tuning could promote hallucinations, where models generate factually incorrect outputs.

Study Setup and Methodology

To analyze the impact of new knowledge in fine-tuning, the authors designed a controlled setup focused on a closed-book question-answering (QA) task. They categorized the fine-tuning examples into Known and Unknown types, with Known examples further divided into ClearlyKnown, MaybeKnown, and WeaklyKnown categories. The paper evaluates how the proportion of Unknown examples in the fine-tuning dataset affects the model's performance and tendency to hallucinate.

Key Findings

Learning Dynamics: The paper finds that Unknown examples are learned substantially slower than Known examples during fine-tuning. This suggests that LLMs struggle to integrate new factual knowledge through fine-tuning and primarily enhance their ability to utilize pre-existing knowledge.
Hallucinations: There is a linear correlation between the proportion of Unknown examples learned and the model's tendency to hallucinate. This highlights the risk of introducing new factual knowledge through fine-tuning, which can compromise the model’s reliability by increasing hallucinations.
Overfitting and Early-Stopping: As Unknown examples are primarily learned in the later stages of training, their presence increases the risk of overfitting. The paper demonstrates that early-stopping can mitigate this issue, improving development performance by preventing the fitting of most Unknown examples.
Filtering Unknown Examples: Removing Unknown examples from the fine-tuning dataset significantly reduces the risk of overfitting without sacrificing performance. This indicates that aligning the fine-tuning data with the model's pre-existing knowledge is crucial for optimal performance.
Performance Across Categories: Fine-tuning on ClearlyKnown examples alone does not yield the best results. Incorporating MaybeKnown examples, which represent facts with lower certainty, is essential for handling such examples during inference, thereby improving the model's performance.

Implications for Practice and Theory

The paper's findings have several practical implications. Fine-tuning with a high proportion of Unknown examples can degrade model performance and increase hallucinations. Thus, it is advisable to control the introduction of new factual knowledge during fine-tuning. Techniques such as early-stopping and filtering-out Unknown examples can be effective in maintaining model reliability.

From a theoretical perspective, the findings support the hypothesis that LLMs mostly acquire factual knowledge through pre-training, while fine-tuning predominantly teaches models to use this knowledge more efficiently. This underscores the limited efficacy of supervised fine-tuning as a means to integrate new factual knowledge, suggesting a need for alternative methods or refined fine-tuning approaches.

Future Directions

Future research could explore various avenues to address these issues:

Developing robust methods for filtering or appropriately labeling new factual information encountered during fine-tuning.
Investigating the long-term effects of new knowledge introduction in broader and more diverse dataset contexts.
Exploring alternative fine-tuning strategies that can enhance the integration of new knowledge without promoting hallucinations.

Conclusion

The paper provides significant insights into the dynamics of knowledge acquisition in LLMs and the consequences of introducing new factual information through fine-tuning. The results demonstrate that while LLMs enhance their utilization of pre-existing knowledge through fine-tuning, they struggle to integrate new knowledge, which leads to increased hallucinations. Practitioners should consider these findings when designing fine-tuning processes to avoid adverse effects and maximize the efficiency of LLMs.

Related Papers

Tweets

https://twitter.com/arankomatsuzaki/status/1788859706187882960

https://twitter.com/emollick/status/1788942529401884758

https://twitter.com/omarsar0/status/1793292346978623812

https://twitter.com/zorikgekhman/status/1791088925475270908

https://twitter.com/roireichart/status/1837728046939242725

https://twitter.com/jmhessel/status/1801706105686659100

YouTube

Show All Videos

HackerNews

Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations? (36 points, 17 comments)