FeTT: Continual Class Incremental Learning via Feature Transformation Tuning (2405.11822v1)
Abstract: Continual learning (CL) aims to extend deep models from static and enclosed environments to dynamic and complex scenarios, enabling systems to continuously acquire new knowledge of novel categories without forgetting previously learned knowledge. Recent CL models have gradually shifted towards the utilization of pre-trained models (PTMs) with parameter-efficient fine-tuning (PEFT) strategies. However, continual fine-tuning still presents a serious challenge of catastrophic forgetting due to the absence of previous task data. Additionally, the fine-tune-then-frozen mechanism suffers from performance limitations due to feature channels suppression and insufficient training data in the first CL task. To this end, this paper proposes feature transformation tuning (FeTT) model to non-parametrically fine-tune backbone features across all tasks, which not only operates independently of CL training data but also smooths feature channels to prevent excessive suppression. Then, the extended ensemble strategy incorporating different PTMs with FeTT model facilitates further performance improvement. We further elaborate on the discussions of the fine-tune-then-frozen paradigm and the FeTT model from the perspectives of discrepancy in class marginal distributions and feature channels. Extensive experiments on CL benchmarks validate the effectiveness of our proposed method.
- Gido M. van de Ven and Andreas S. Tolias. Three scenarios for continual learning. CoRR, abs/1904.07734, 2019.
- An efficient domain-incremental learning approach to drive in all weather conditions. In IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops 2022, pages 3000–3010. IEEE, 2022.
- A continual learning survey: Defying forgetting in classification tasks. IEEE Trans. Pattern Anal. Mach. Intell., 44(7):3366–3385, 2022.
- Deep class-incremental learning: A survey. CoRR, abs/2302.03648, 2023.
- Catastrophic interference in connectionist networks: The sequential learning problem. In Psychology of Learning and Motivation, volume 24, pages 109–165. Academic Press, 1989.
- Overcoming catastrophic forgetting in neural networks. Proceedings of the National Academy of Sciences, 114(13):3521–3526, 2017.
- Memory aware synapses: Learning what (not) to forget. In European Conference on Computer Vision, ECCV 2018, volume 11207, pages 144–161. Springer, 2018.
- Distilling the knowledge in a neural network. CoRR, abs/1503.02531, 2015.
- Learning without forgetting. In European Conference on Computer Vision, ECCV 2016, volume 9908, pages 614–629. Springer, 2016.
- icarl: Incremental classifier and representation learning. In 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, pages 5533–5542. IEEE Computer Society, 2017.
- Gradient episodic memory for continual learning. In Advances in Neural Information Processing Systems 30, NeurIPS 2017, pages 6467–6476, 2017.
- Mixture uniform distribution modeling and asymmetric mix distillation for class incremental learning. In Thirty-Seventh AAAI Conference on Artificial Intelligence, AAAI 2023, pages 9498–9506. AAAI Press, 2023.
- DER: dynamically expandable representation for class incremental learning. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, pages 3014–3023. Computer Vision Foundation / IEEE, 2021.
- FOSTER: feature boosting and compression for class-incremental learning. In uropean Conference on Computer Vision, ECCV 2022, volume 13685, pages 398–414. Springer, 2022.
- Dynamic feature learning and matching for class-incremental learning. CoRR, abs/2405.08533, 2024.
- Learning to prompt for continual learning. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, pages 139–149. IEEE, 2022.
- Dualprompt: Complementary prompting for rehearsal-free continual learning. In European Conference on Computer Vision, ECCV 2022, volume 13686, pages 631–648. Springer, 2022.
- Coda-prompt: Continual decomposed attention-based prompting for rehearsal-free continual learning. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023, pages 11909–11919. IEEE, 2023.
- Revisiting class-incremental learning with pre-trained models: Generalizability and adaptivity are all you need. CoRR, abs/2303.07338, 2023.
- A comprehensive study of class incremental learning algorithms for visual tasks. Neural Networks, 135:38–54, 2021.
- Continual learning with pre-trained models: A survey. CoRR, abs/2401.16386, 2024.
- Memory efficient data-free distillation for continual learning. Pattern Recognit., 144:109875, 2023.
- Hyper-feature aggregation and relaxed distillation for class incremental learning. Pattern Recognit., 152:110440, 2024.
- Rebalancing network with knowledge stability for class incremental learning. Pattern Recognit., 153:110506, 2024.
- Continual learning with deep generative replay. In Advances in Neural Information Processing Systems 30, NeurIPS, pages 2990–2999, 2017.
- DDGR: continual learning with deep diffusion-based generative replay. In International Conference on Machine Learning, ICML 2023, volume 202, pages 10744–10763. PMLR, 2023.
- Focl: Feature-oriented continual learning for generative models. Pattern Recognit., 120:108127, 2021.
- Podnet: Pooled outputs distillation for small-tasks incremental learning. In European Conference on Computer Vision, ECCV 2020, volume 12365, pages 86–102. Springer, 2020.
- An image is worth 16x16 words: Transformers for image recognition at scale. In 9th International Conference on Learning Representations, ICLR 2021. OpenReview.net, 2021.
- Learning transferable visual models from natural language supervision. In Proceedings of the 38th International Conference on Machine Learning, ICML 2021, volume 139, pages 8748–8763. PMLR, 2021.
- Preventing zero-shot transfer degradation in continual learning of vision-language models. In IEEE/CVF International Conference on Computer Vision, ICCV 2023, pages 19068–19079. IEEE, 2023.
- Boosting continual learning of vision-language models via mixture-of-experts adapters. CoRR, abs/2403.11549, 2024.
- SLCA: slow learner with classifier alignment for continual learning on a pre-trained model. In IEEE/CVF International Conference on Computer Vision, ICCV 2023, pages 19091–19101. IEEE, 2023.
- Ranpac: Random projections and pre-trained models for continual learning. In Advances in Neural Information Processing Systems 36, NeurIPS 2023, 2023.
- BERT: pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, pages 4171–4186. Association for Computational Linguistics, 2019.
- Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
- A simple framework for contrastive learning of visual representations. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, volume 119, pages 1597–1607. PMLR, 2020.
- Rethinking imagenet pre-training. In 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, pages 4917–4926. IEEE, 2019.
- Momentum contrast for unsupervised visual representation learning. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, pages 9726–9735. Computer Vision Foundation / IEEE, 2020.
- Parameter-efficient fine-tuning methods for pretrained language models: A critical review and assessment. CoRR, abs/2312.12148, 2023.
- Visual prompt tuning. In European Conference on Computer Vision, ECCV 2022, volume 13693, pages 709–727. Springer, 2022.
- Lora: Low-rank adaptation of large language models. In The Tenth International Conference on Learning Representations, ICLR 2022. OpenReview.net, 2022.
- Scaling & shifting your features: A new baseline for efficient model tuning. In Advances in Neural Information Processing Systems 35, NeurIPS 2022, 2022.
- Adaptformer: Adapting vision transformers for scalable visual recognition. In Advances in Neural Information Processing Systems 35, NeurIPS 2022, 2022.
- Attention is all you need. In Advances in Neural Information Processing Systems 30, NeurIPS 2017, pages 5998–6008, 2017.
- Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, volume 37, pages 448–456. JMLR.org, 2015.
- Free lunch for few-shot learning: Distribution calibration. In 9th International Conference on Learning Representations, ICLR 2021. OpenReview.net, 2021.
- Channel importance matters in few-shot image classification. In International Conference on Machine Learning, ICML 2022, volume 162, pages 14542–14559. PMLR, 2022.
- Freeu: Free lunch in diffusion u-net. CoRR, abs/2309.11497, 2023.
- John Wilder Tukey et al. Exploratory data analysis, volume 2. Springer, 1977.
- Learning deep features for discriminative localization. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, pages 2921–2929. IEEE Computer Society, 2016.
- Improving adversarial robustness via channel-wise activation suppressing. In 9th International Conference on Learning Representations, ICLR 2021. OpenReview.net, 2021.
- Vladimir Vapnik. Principles of risk minimization for learning theory. In Advances in Neural Information Processing Systems 4, [NeuIPS 1991, pages 831–838. Morgan Kaufmann, 1991.
- Semantic drift compensation for class-incremental learning. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, pages 6980–6989. Computer Vision Foundation / IEEE, 2020.
- Learning multiple layers of features from tiny images. Toronto, ON, Canada, 2009.
- The caltech-ucsd birds-200-2011 dataset. California Institute of Technology, 2011.
- Natural adversarial examples. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, pages 15262–15271. Computer Vision Foundation / IEEE, 2021.
- The many faces of robustness: A critical analysis of out-of-distribution generalization. In 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, pages 8320–8329. IEEE, 2021.
- Objectnet: A large-scale bias-controlled dataset for pushing the limits of object recognition models. In Advances in Neural Information Processing Systems, NeurIPS 2019, pages 9448–9458, 2019.
- A large-scale study of representation learning with the visual task adaptation benchmark. arXiv preprint arXiv:1910.04867, 2019.
- Imagenet: A large-scale hierarchical image database. In 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2009), pages 248–255. IEEE Computer Society, 2009.
- Pytorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems 32, NeurIPS 2019, pages 8024–8035, 2019.
- Imagenet-21k pretraining for the masses. In Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks, NeurIPS Datasets and Benchmarks 2021, 2021.
- Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. J. Mach. Learn. Res., 9(86):2579–2605, 2008.
- Sunyuan Qiang (3 papers)
- Xuxin Lin (3 papers)
- Yanyan Liang (29 papers)
- Jun Wan (79 papers)
- Du Zhang (9 papers)