Self-Distillation Bridges Distribution Gap in Language Model Fine-Tuning

Published 21 Feb 2024 in cs.CL | (2402.13669v2)

Abstract: The surge in LLMs has revolutionized natural language processing, but fine-tuning them for specific tasks often encounters challenges in balancing performance and preserving general instruction-following abilities. In this paper, we posit that the distribution gap between task datasets and the LLMs serves as the primary underlying cause. To address the problem, we introduce Self-Distillation Fine-Tuning (SDFT), a novel approach that bridges the distribution gap by guiding fine-tuning with a distilled dataset generated by the model itself to match its original distribution. Experimental results on the Llama-2-chat model across various benchmarks demonstrate that SDFT effectively mitigates catastrophic forgetting while achieving comparable or superior performance on downstream tasks compared to the vanilla fine-tuning. Moreover, SDFT demonstrates the potential to maintain the helpfulness and safety alignment of LLMs. Our code is available at https://github.com/sail-sg/sdft.

Abstract PDF HTML Upgrade to Chat

Authors (7)

References (48)

Citations (19)

View on Semantic Scholar

Summary

The paper introduces SDFT, a self-distillation method that generates a distilled dataset to mitigate catastrophic forgetting during fine-tuning.
It employs techniques like LoRA to ensure computational efficiency while preserving core model capabilities.
Empirical results demonstrate that SDFT improves performance and safety alignment in LLMs such as Llama-2-chat across diverse benchmarks.

An Expert Overview of "Self-Distillation Bridges Distribution Gap in LLM Fine-Tuning"

The research paper titled "Self-Distillation Bridges Distribution Gap in LLM Fine-Tuning" introduces a novel approach to address a significant challenge encountered during the fine-tuning of LLMs for specific tasks. As the LLM paradigm continues to evolve with models like GPT-3 and PaLM, adapting these generalized models for particularized applications often leads to a distributional gap between the training datasets and the models’ inherent distributions, primarily due to differences in task specifications. This paper posits that this gap is a critical factor in the loss of general capabilities, such as instruction-following abilities, during task-specific fine-tuning, commonly known as catastrophic forgetting.

Research Contributions

The principal contribution of the paper is the introduction of a method termed Self-Distillation Fine-Tuning (SDFT). SDFT innovatively bridges the distribution gap by utilizing the LLM to generate a distilled dataset that mirrors the model's original distribution prior to fine-tuning. This dataset serves as guidance for the model during subsequent fine-tuning processes. Notably, this approach seeks to maintain baseline general capabilities while enhancing performance on specific downstream tasks.

Methodology

Self-Distillation Approach:

Dataset Generation: SDFT prompts the LLM to paraphrase original response outputs into distilled responses. This semantic rewriting maintains a similar distribution to the model’s pre-fine-tuning state. Figure 1 in the paper effectively illustrates how this preserves the model's capacity for various capabilities.
Task-Specific Fine-Tuning: The LLM is fine-tuned on this newly constructed distilled dataset rather than the original task-specific set, slowing distribution drift and minimizing the loss of non-target capabilities.

The paper details the usage of LoRA (Low Rank Adaptation) for computational efficiency and sensible usage of resources during model adaptation.

Experimental Insights and Results

The authors evaluate the efficacy of SDFT using the Llama-2-chat model across several benchmarks including mathematical reasoning, code generation, and general task alignment tasks. Notably, when fine-tuned using SDFT, the model consistently outperforms traditionally fine-tuned models in retaining prior learned capabilities while achieving improved or comparable task-specific performance. For instance, in coding tasks evaluated on the HumanEval benchmark, SDFT not only retained prior model performance levels (pass@1 moved from 13.4 to 15.2) but also improved them beyond the baseline (27% performance loss was mitigated).

Safety and Helpfulness Alignment:

The empirical evaluations reveal that SDFT holds potential for preserving safety and helpfulness alignments in LLMs. Standard fine-tuning often degrades these metrics, posing safety risks. SDFT effectively mitigates degradation (e.g., safety alignment saw less than a 1% drop compared to up to 20% with vanilla fine-tuning).

Theoretical Implications and Future Research

From a theoretical standpoint, SDFT suggests a promising trajectory in model fine-tuning paradigms that prioritize the preservation of generalized language capabilities. It provides a methodological foundation upon which future improvements can be constructed, particularly pertaining to efficiency and scope in real-world applications.

Impactful future developments could explore:

More advanced distillation techniques or blended methods that combine self-distillation with other continual learning approaches to further alleviate catastrophic forgetting.
Broader evaluations against a wider variety of LLM architectures and more diversified datasets to validate the generalizability of SDFT.
Exploration into more nuanced safety and task alignment conditions, ensuring the model’s robust real-time response across diverse unseen scenarios.

Conclusion

In summary, the paper significant advances the conversation on LLM fine-tuning by proposing SDFT. This method strategically minimizes distributional discrepancies that jeopardize LLMs' multifaceted abilities. As the landscape of AI grows increasingly sophisticated, approaches like SDFT will be pivotal in balancing task-specific acuity and versatile general functional competencies, ensuring finer control over model adaptation processes.

Markdown Report Issue