Long-Tail Learning with Foundation Model: Heavy Fine-Tuning Hurts (2309.10019v3)

Published 18 Sep 2023 in cs.CV and cs.LG

Abstract: The fine-tuning paradigm in addressing long-tail learning tasks has sparked significant interest since the emergence of foundation models. Nonetheless, how fine-tuning impacts performance in long-tail learning was not explicitly quantified. In this paper, we disclose that heavy fine-tuning may even lead to non-negligible performance deterioration on tail classes, and lightweight fine-tuning is more effective. The reason is attributed to inconsistent class conditions caused by heavy fine-tuning. With the observation above, we develop a low-complexity and accurate long-tail learning algorithms LIFT with the goal of facilitating fast prediction and compact models by adaptive lightweight fine-tuning. Experiments clearly verify that both the training time and the learned parameters are significantly reduced with more accurate predictive performance compared with state-of-the-art approaches. The implementation code is available at https://github.com/shijxcs/LIFT.

Citations (14)

View on Semantic Scholar

Summary

The paper proposes PEL, a parameter-efficient method that improves long-tailed recognition by reducing overfitting during fine-tuning.
It employs semantic-aware classifier initialization using textual encodings to accelerate convergence and boost performance on underrepresented classes.
The framework integrates test-time ensembling to robustly outperform state-of-the-art methods across diverse long-tailed datasets in fewer than 20 epochs.

Parameter-Efficient Long-Tailed Recognition

The paper proposes a novel framework called Parameter-Efficient Long-Tailed Recognition (PEL) to enhance the adaptation of pre-trained models like CLIP for long-tailed recognition tasks. This paper addresses a notable challenge in the field of computer vision, where datasets often contain an imbalance in the number of examples representing different classes. While head classes are well-represented, tail classes suffer from a scarcity of examples. The research demonstrates notable advancements in tackling this issue without the need for additional data or extensive training epochs, providing insightful contributions to both the theoretical and practical domains of machine learning.

Methodology Overview

PEL integrates a parameter-efficient fine-tuning method that introduces a small number of task-specific parameters, mitigating overfitting typically associated with conventional fine-tuning strategies. The framework exploits semantic-aware classifier initialization derived from textual encodings of class descriptions in CLIP, ensuring the adaptation process is both computationally efficient and semantically enriched. Furthermore, a test-time ensembling (TTE) technique is incorporated, enhancing the generalization capability of models by aggregating predictions from perturbed versions of input data.

Main Findings

Parameter Efficiency: By adopting existing parameter-efficient fine-tuning methods, PEL demonstrates the ability to retain discriminative features essential for handling tail classes effectively while significantly reducing the number of learnable parameters compared to full model fine-tuning.
Semantic Initialization: The semantic-aware initialization technique accelerates convergence and improves performance by leveraging the rich semantic information embedded within CLIP's textual encoder.
Robust Performance: Experimental results across multiple long-tailed datasets—ImageNet-LT, Places-LT, iNaturalist 2018, and CIFAR-100-LT—show that PEL consistently outperforms previous state-of-the-art methods, even those relying on external data for model training. PEL achieves remarkable results with fewer than 20 epochs.
Generality: The framework is general, supporting various parameter-efficient methods such as VPT, Adapter, and LoRA, which can be integrated seamlessly without extensive modification or computational overhead.
Test-Time Ensembling: The TTE approach significantly enhances generalization by mitigating biases incurred during data preprocessing, further optimizing the model's predictive performance.

Implications and Future Research

The implications of this research extend to both the theoretical enhancement of model adaptation techniques and practical applications in domains where data imbalance is prevalent. Moreover, the methodology underscores the importance of semantic initialization and efficient fine-tuning in pre-trained models, which could inform future research on model adaptation strategies. Addressing long-tailed recognition without auxiliary data presents a robust solution for various applications where data collection remains challenging.

Future research may focus on refining these elements further, exploring model architectures beyond CLIP, and assessing the general applicability of semantic-aware initialization across different types of pre-trained models. Additionally, the integration of other modality encoders and exploring further reductions in computational complexity can contribute to the ongoing development of efficient, adaptable recognition systems in machine learning contexts.

PDF Markdown

Related Papers

GitHub

GitHub - shijxcs/PEL: Parameter-Efficient Long-Tailed Recognition (84 stars)