Emergent Mind

GIST: Improving Parameter Efficient Fine Tuning via Knowledge Interaction

(2312.07255)
Published Dec 12, 2023 in cs.CL and cs.CV

Abstract

The Parameter-Efficient Fine-Tuning (PEFT) method, which adjusts or introduces fewer trainable parameters to calibrate pre-trained models on downstream tasks, has become a recent research interest. However, existing PEFT methods within the traditional fine-tiuning framework have two main shortcomings: 1) They overlook the explicit association between trainable parameters and downstream task knowledge. 2) They neglect the interaction between the intrinsic task-agnostic knowledge of pre-trained models and the task-specific knowledge in downstream tasks. To address this gap, we propose a novel fine-tuning framework, named GIST, in a plug-and-play manner. Specifically, our framework first introduces a trainable token, called the Gist token, when applying PEFT methods on downstream tasks. This token serves as an aggregator of the task-specific knowledge learned by the PEFT methods and forms an explicit association with downstream knowledge. Furthermore, to facilitate explicit interaction between task-agnostic and task-specific knowledge, we introduce the concept of Knowledge Interaction via a Bidirectional Kullback-Leibler Divergence objective. As a result, PEFT methods within our framework can make the pre-trained model understand downstream tasks more comprehensively by leveraging the knowledge interaction. Extensive experiments demonstrate the universality and scalability of our framework. Notably, on the VTAB-1K benchmark, we employ the Adapter (a prevalent PEFT method) within our GIST framework and achieve a performance boost of 2.25%, with an increase of only 0.8K parameters. The Code will be released.

Overview

  • GIST introduces a new fine-tuning framework that enhances Parameter-Efficient Fine-Tuning (PEFT) for AI models by focusing on knowledge interaction.

  • A novel Gist token is used to encapsulate task-specific knowledge (TSK) during fine-tuning, improving the model's task adaptation.

  • The framework employs Bidirectional Kullback-Leibler Divergence (BKLD) to integrate TSK with task-agnostic knowledge (TAK) from pre-training.

  • Empirical tests show that GIST outperforms traditional fine-tuning methods on benchmarks for various applications with minimal parameter increase.

  • GIST fosters more efficient and scalable AI systems by utilizing pre-trained models for specific tasks without significant computational resource demands.

Understanding GIST: Enhanced Fine-Tuning for AI Models

Background

AI models, especially those based on Transformers, have seen substantial advancements and have significantly improved performance across numerous fields. However, the considerable size of these models presents challenges, particularly when fine-tuning for specific tasks. Each task typically requires separate, full-scale model training, leading to high storage costs and possible overfitting due to limited data in specialized tasks.

Parameter-Efficient Fine-Tuning

Recently, research has pivoted towards Parameter-Efficient Fine-Tuning (PEFT) methods. These methods aim to adjust or introduce only a minimal set of trainable parameters to adapt pre-trained models to new tasks. Though promising, PEFT within the traditional fine-tuning framework can be suboptimal as it does not create a clear connection between new parameters and task-specific knowledge (TSK), nor does it consider the interaction with the underlying task-agnostic knowledge (TAK) derived from general pre-training.

Introducing GIST

To bridge these gaps, researchers have conceptualized a new fine-tuning framework called GIST. This framework innovates the fine-tuning process by implementing two main features:

  1. Gist Token: A trainable token is introduced when applying PEFT methods to new tasks. This token serves as a dedicated vessel for absorbing TSK during fine-tuning. The Gist token adapts the concept of the Class token, typically used in Transformers to capture global information, assigning it the role of integrating TSK learnt by PEFT parameters.
  2. Knowledge Interaction: To foster an effective mingling between TAK (that the model intrinsically has from its pre-training) and TSK, the Gist model utilizes a novel objective called Bidirectional Kullback-Leibler Divergence (BKLD). The model is thus trained to marry the nuances of broad, generic knowledge with specialized task knowledge.

Empirical Validation

When put to the test on various benchmarks, models fine-tuned within the GIST framework consistently outperform their traditional framework counterparts. The success is evident across a range of applications, from image classification to language understanding tasks—confirming the framework's adaptability and scalability.

What sets GIST apart is its ability to significantly enhance model performance with an almost negligible increase in the number of trainable parameters. For example, an experiment on the VTAB-1K benchmark using the Adapter (a popular PEFT method) within the GIST framework exhibited a performance increase of 2.25% while adding a mere 0.8K parameters.

Conclusion

GIST marks a significant step in the evolution of PEFT techniques. It establishes a direct link between fine-tuning parameters and task-specific objectives while synergizing with the foundational knowledge models have accumulated. The outcome is a genuinely scalable and efficient avenue for task adaptation, enabling more resourceful AI systems that are better at specific tasks without being exhaustive on computational resources. Future research may continue to explore this direction, possibly uncovering even more proficient ways of leveraging the vast knowledge embedded within pre-trained AI models.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.