SLCA: Slow Learner with Classifier Alignment for Continual Learning on a Pre-trained Model (2303.05118v4)
Abstract: The goal of continual learning is to improve the performance of recognition models in learning sequentially arrived data. Although most existing works are established on the premise of learning from scratch, growing efforts have been devoted to incorporating the benefits of pre-training. However, how to adaptively exploit the pre-trained knowledge for each incremental task while maintaining its generalizability remains an open question. In this work, we present an extensive analysis for continual learning on a pre-trained model (CLPM), and attribute the key challenge to a progressive overfitting problem. Observing that selectively reducing the learning rate can almost resolve this issue in the representation layer, we propose a simple but extremely effective approach named Slow Learner with Classifier Alignment (SLCA), which further improves the classification layer by modeling the class-wise distributions and aligning the classification layers in a post-hoc fashion. Across a variety of scenarios, our proposal provides substantial improvements for CLPM (e.g., up to 49.76%, 50.05%, 44.69% and 40.16% on Split CIFAR-100, Split ImageNet-R, Split CUB-200 and Split Cars-196, respectively), and thus outperforms state-of-the-art approaches by a large margin. Based on such a strong baseline, critical factors and promising directions are analyzed in-depth to facilitate subsequent research. Code has been made available at: https://github.com/GengDavid/SLCA.
- Memory aware synapses: Learning what (not) to forget. In Proceedings of the European Conference on Computer Vision, pages 139–154, 2018.
- Beit: Bert pre-training of image transformers. arXiv preprint arXiv:2106.08254.
- Dark experience for general continual learning: a strong, simple baseline. In Advances in Neural Information Processing Systems, volume 33, pages 15920–15930, 2020.
- An empirical study of training self-supervised vision transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 9640–9649, 2021.
- Continual pre-training mitigates forgetting in language and vision. arXiv preprint arXiv:2205.09357, 2022.
- Imagenet: A large-scale hierarchical image database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 248–255, 2009.
- Learning without memorizing. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5138–5146, 2019.
- The turking test: Can language models understand instructions? arXiv preprint arXiv:2010.11982, 2020.
- Self-supervised models are continual learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9621–9630, 2022.
- Spottune: transfer learning through adaptive fine-tuning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4805–4814, 2019.
- Remind your neural network to prevent catastrophic forgetting. In Proceedings of the European Conference on Computer Vision, pages 466–483, 2020.
- Lifelong machine learning with deep streaming linear discriminant analysis. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pages 220–221, 2020.
- Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16000–16009, 2022.
- Rethinking imagenet pre-training. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4918–4927, 2019.
- The many faces of robustness: A critical analysis of out-of-distribution generalization. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 8340–8349, 2021.
- Parameter-efficient transfer learning for nlp. In International Conference on Machine Learning, pages 2790–2799. PMLR, 2019.
- How well self-supervised pre-training performs with streaming data? arXiv preprint arXiv:2104.12081, 2021.
- Continual learning of natural language processing tasks: A survey. arXiv preprint arXiv:2211.12701, 2022.
- Overcoming catastrophic forgetting in neural networks. Proceedings of the National Academy of Sciences, 114(13):3521–3526, 2017.
- Similarity of neural network representations revisited. In Proceedings of International Conference on Machine Learning, pages 3519–3529. PMLR, 2019.
- 3d object representations for fine-grained categorization. In Proceedings of the IEEE International Conference on Computer Vision Workshops, pages 554–561, 2013.
- Learning multiple layers of features from tiny images. Technical report, 2009.
- Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems, volume 25, pages 1097–1105, 2012.
- The power of scale for parameter-efficient prompt tuning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 3045–3059, 2021.
- Learning without forgetting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(12):2935–2947, 2017.
- Rethinking the representational continuity: Towards unsupervised continual learning. arXiv preprint arXiv:2110.06976, 2021.
- Why there are complementary learning systems in the hippocampus and neocortex: Insights from the successes and failures of connectionist models of learning and memory. Psychological Review, 102(3):419, 1995.
- An empirical investigation of the role of pre-training in lifelong learning. arXiv preprint arXiv:2112.09153, 2021.
- Xingchao Peng et al. Moment matching for multi-source domain adaptation. In ICCV, 2019.
- Gdumb: A simple approach that questions our progress in continual learning. In Proceedings of European Conference on Computer Vision, pages 524–540, 2020.
- Effect of scale on catastrophic forgetting in neural networks. In Proceedings of the International Conference on Learning Representations, 2021.
- Imagenet-21k pretraining for the masses. arXiv preprint arXiv:2104.10972, 2021.
- Progressive neural networks. arXiv preprint arXiv:1606.04671, 2016.
- Overcoming catastrophic forgetting with hard attention to the task. In Proceedings of International Conference on Machine Learning, pages 4548–4557, 2018.
- Incremental learning of object detectors without catastrophic forgetting. In Proceedings of the IEEE international conference on computer vision, pages 3400–3409, 2017.
- Training data-efficient image transformers & distillation through attention. In Proceedings of International Conference on Machine Learning, pages 10347–10357. PMLR, 2021.
- Gido M van de Ven and Andreas S Tolias. Three scenarios for continual learning. arXiv preprint arXiv:1904.07734.
- The caltech-ucsd birds-200-2011 dataset. 2011.
- Triple-memory networks: A brain-inspired method for continual learning. IEEE Transactions on Neural Networks and Learning Systems, 2021.
- Ordisco: Effective and efficient usage of incremental unlabeled data for semi-supervised continual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5383–5392, 2021.
- Afec: Active forgetting of negative transfer in continual learning. In Advances in Neural Information Processing Systems, volume 34, 2021.
- Coscl: Cooperation of small continual learners is stronger than a big one. In European Conference on Computer Vision, pages 254–271. Springer, 2022.
- A comprehensive survey of continual learning: Theory, method and application. arXiv preprint arXiv:2302.00487, 2023.
- Memory replay with data compression for continual learning. In Proceedings of the International Conference on Learning Representations, 2021.
- Dualprompt: Complementary prompting for rehearsal-free continual learning. arXiv preprint arXiv:2204.04799, 2022.
- Learning to prompt for continual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 139–149, 2022.
- Mitigating neural network overconfidence with logit normalization. arXiv preprint arXiv:2205.09310, 2022.
- Class-incremental learning with strong pre-trained models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9601–9610, 2022.
- Large scale incremental learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 374–382, 2019.
- Continual object detection via prototypical task correlation guided gating mechanism. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9255–9264, 2022.
- Continual learning through synaptic intelligence. In Proceedings of the International Conference on Machine Learning, pages 3987–3995, 2017.
- Few-shot segmentation via cycle-consistent transformer. Advances in Neural Information Processing Systems, 34:21984–21996, 2021.
- Revisiting few-sample bert fine-tuning. arXiv preprint arXiv:2006.05987, 2020.
- Mining unseen classes via regional objectness: A simple baseline for incremental segmentation. NeurIPS, 35, 2022.
- Coinseg: Contrast inter- and intra- class representations for incremental segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023.
- Prototype augmentation and self-supervision for incremental learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5871–5880, 2021.
- Ctp: Towards vision-language continual pretraining via compatible momentum contrast and topology preservation, 2023.