Emergent Mind

Continual Learning of Large Language Models: A Comprehensive Survey

(2404.16789)
Published Apr 25, 2024 in cs.LG , cs.AI , and cs.CL

Abstract

The recent success of LLMs trained on static, pre-collected, general datasets has sparked numerous research directions and applications. One such direction addresses the non-trivial challenge of integrating pre-trained LLMs into dynamic data distributions, task structures, and user preferences. Pre-trained LLMs, when tailored for specific needs, often experience significant performance degradation in previous knowledge domains -- a phenomenon known as "catastrophic forgetting". While extensively studied in the continual learning (CL) community, it presents new manifestations in the realm of LLMs. In this survey, we provide a comprehensive overview of the current research progress on LLMs within the context of CL. This survey is structured into four main sections: we first describe an overview of continually learning LLMs, consisting of two directions of continuity: vertical continuity (or vertical continual learning), i.e., continual adaptation from general to specific capabilities, and horizontal continuity (or horizontal continual learning), i.e., continual adaptation across time and domains (Section 3). We then summarize three stages of learning LLMs in the context of modern CL: Continual Pre-Training (CPT), Domain-Adaptive Pre-training (DAP), and Continual Fine-Tuning (CFT) (Section 4). Then we provide an overview of evaluation protocols for continual learning with LLMs, along with the current available data sources (Section 5). Finally, we discuss intriguing questions pertaining to continual learning for LLMs (Section 6). The full list of papers examined in this survey is available at https://github.com/Wang-ML-Lab/llm-continual-learning-survey.

Overview of stages in continually pre-training and fine-tuning large language models with strategies to prevent forgetting.

Overview

  • The paper discusses how LLMs, such as transformer-based architectures, have made substantial progress but need continual learning (CL) to stay relevant by adapting to new data and knowledge over time.

  • It introduces concepts like vertical continuity, which focuses on adapting existing models from broad to specific tasks, and horizontal continuity, which deals with integrating new information over time without forgetting past knowledge.

  • The paper also outlines the stages in continual learning for LLMs: Continual Pre-training (CPT), Domain-Adaptive Pre-training (DAP), and Continual Fine-Tuning (CFT), along with techniques like rehearsal, regularization, and architectural strategies to enhance learning adaptability.

Overview of Continual Learning for LLMs

Introduction to LLMs and Continual Learning

Recent advancements in LLMs, such as transformer-based architectures, have ushered in a new era in understanding and generating human language. These models have thrived across a spectrum of tasks like translation, summarization, and more complex challenges like dialogue systems through the massive training data they consume. However, most of these models are regularly trained on static, extensive datasets, leading to them becoming obsolete unless continually updated to adapt to new data or knowledge—a requirement that has created the subfield of continual learning (CL) in LLMs.

CL in LLMs involves training models on a sequence of data or tasks while retaining knowledge acquired from past data. A core challenge here is avoiding catastrophic forgetting, the tendency of a neural network to entirely and abruptly forget previously learned information upon learning new data.

Detailed Insights into Continual Learning for LLMs

Two Axes of Continuity in LLM Training

  1. Vertical Continuity: Vertical continuity involves adapting established LLMs from broad tasks with large data scopes to specific tasks with narrower focus. This adaptation occurs in layers, starting from general large-scale datasets to more specialized and smaller ones. The risk here involves 'vertical forgetting', where the model might forget its broader capabilities when tuned for specific capabilities.
  2. Horizontal Continuity: This form encompasses the adaptation of models over time to incorporate new trends, knowledge, or changes in data distribution without losing its coherence on past data streams. The primary challenge in horizontal continuity is managing the 'horizontal forgetting' over an extended period or across distinct data domains.

Stages of Learning in Continually Adaptive LLMs

Three main stages define the spectrum of adapting LLMs in a continual learning context:

  1. Continual Pre-training (CPT): Models undergo repeated updates from sequential or periodic collection of new and diverse datasets to enhance their generality and capacity.
  2. Domain-Adaptive Pre-Training (DAP): Before deployment or further specific task training, LLMs are fine-tuned on domain-specific datasets to ensure they perform well under specialized conditions.
  3. Continual Fine-Tuning (CFT): The final tune-up on narrow, task-specific datasets before deployment, ensuring the LLMs perform optimally for its end-use goals, like specific language understanding tasks in defined contexts.

CL Techniques in LLMs and Their Implementation

Commonly used techniques in continual learning involve:

  • Rehearsal: Re-using previous data to remind the model of its past learning.
  • Regularization: Penalizing changes to parameters critical for past tasks.
  • Architectural Strategies: Dynamically expanding the model's architecture to accommodate new knowledge without displacing old information.

Prospect and Future of Continually Learning LLMs

The continually evolving nature of data and tasks necessitates corresponding advances in LLMs' learning strategies. Future work might explore more efficient memory usage, adaptation techniques that allow personalizing models without extensive re-training, and theoretical underpinnings that can help better predict and manage model behavior over continual cycles.

Conclusion

Continual learning in LLMs is an active area of research poised for substantial growth. It integrates innovative AI research with practical applications, aiming to develop LLMs that can learn continuously and adaptively, similar to human learning processes. The path forward includes enhancing model robustness, improving learning efficiency, and developing models that can quickly adapt to new information while retaining valuable past knowledge.

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.

YouTube