Examining Forgetting in Continual Pre-training of Aligned Large Language Models (2401.03129v1)

Published 6 Jan 2024 in cs.CL

Abstract: Recent advances in LLMs have exhibited remarkable proficiency across various tasks. Given the potent applications of LLMs in numerous fields, there has been a surge in LLM development. In developing LLMs, a common practice involves continual pre-training on previously fine-tuned models. However, this can lead to catastrophic forgetting. In our work, we investigate the phenomenon of forgetting that occurs during continual pre-training on an existing fine-tuned LLM. We evaluate the impact of continuous pre-training on the fine-tuned LLM across various dimensions, including output format, knowledge, and reliability. Experiment results highlight the non-trivial challenge of addressing catastrophic forgetting during continual pre-training, especially the repetition issue.

References (37)

Citations (5)

View on Semantic Scholar

Summary

The paper demonstrates that continual pre-training induces catastrophic forgetting in aligned LLMs, diminishing previously acquired capabilities.
It employs strategies like freezing layers, modules, and integrating adapters to mitigate forgetting using a 1 billion token Traditional Chinese corpus.
Benchmark evaluations on ARC, Hellaswag, and C-eval-tw reveal trade-offs between enhanced knowledge retention and declining reliability metrics.

Examining Forgetting in Continual Pre-training of Aligned LLMs

This paper investigates the phenomenon of catastrophic forgetting during the continual pre-training of fine-tuned LLMs. The research highlights both practical and theoretical implications in developing LLMs, with a specific focus on the impact of continual pre-training using a Traditional Chinese corpus.

Context and Motivation

As the capabilities of LLMs advance, there is a significant increase in the release of pre-trained and fine-tuned variants. These models often undergo further pre-training, frequently followed by alignment operations such as Supervised Fine-Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF). Despite its potential benefits, this continual pre-training can lead to catastrophic forgetting, where the model loses previously acquired capabilities.

Methodology

The paper centers on Llama-2-7b-chat, a model fine-tuned with various alignment techniques, and evaluates the influence of additional pre-training on this architecture. By examining dimensions like output format, knowledge, and reliability, the authors explore different techniques to mitigate forgetting:

Freeze Layers: Selectively freezing the first or last ten layers of the model.
Freeze Modules: Experimenting with freezing specific modules, such as self-attention or feed-forward modules, to preserve previously acquired knowledge.
Adapters: Incorporating additional trainable elements like Lora and (Ia) $^3$ to facilitate parameter-efficient continual pre-training.

Experimental Setup

The investigation uses a 1 billion token dataset in Traditional Chinese for further pre-training. For evaluation, the model's performance is analyzed through various tasks:

Language identification and repetition analysis to assess the output format.
Benchmarks such as ARC, Hellaswag, MMLU, and C-eval-tw for knowledge assessment.
Truthfulness, toxicity, and bias metrics to evaluate reliability.

Results

Output Format

The analysis reveals that models pre-trained with the Traditional Chinese corpus often face increased repetition issues, particularly when generating Chinese outputs. Furthermore, models treated with specific adaptation techniques affect the proportion of language output, with differential impacts based on prompt language.

Knowledge

Most continual pre-trained models show improved or maintained performance compared to their non-pre-trained counterparts in ARC and Hellaswag, but subtle differences arise in other benchmarks like C-eval-tw.

Reliability

Models undergoing continual pre-training exhibit a decline in reliability metrics compared to the original Llama-2-7b-chat, particularly in categories like truthfulness and toxicity. This raises concerns about maintaining alignment safety when further tuning LLMs.

Implications

The findings indicate that while continual pre-training can enhance certain aspects of knowledge retention, it poses challenges in output quality and reliability. The repetition problem particularly emphasizes the trade-off when generating outputs in languages that were the focus of continual pre-training.

Future Directions

The paper suggests several avenues for future work, such as exploring pre-training with multilingual datasets and developing methodologies to ensure safety alignment. As the deployment of LLMs becomes more pervasive, these considerations will be crucial for balancing performance with the necessity for reliable and safe outputs.

This paper makes a meaningful contribution to understanding the nuances of continual pre-training in LLMs, offering insights into mitigating the risks of catastrophic forgetting and highlighting areas needing further research and technological advancement.

PDF Markdown

Related Papers

Tweets

https://twitter.com/HungyiLee2/status/1769409593351057908

https://twitter.com/TheTuringPost/status/1769717936858845647