Emergent Mind

SLCA: Slow Learner with Classifier Alignment for Continual Learning on a Pre-trained Model

(2303.05118)
Published Mar 9, 2023 in cs.CV , cs.AI , and cs.LG

Abstract

The goal of continual learning is to improve the performance of recognition models in learning sequentially arrived data. Although most existing works are established on the premise of learning from scratch, growing efforts have been devoted to incorporating the benefits of pre-training. However, how to adaptively exploit the pre-trained knowledge for each incremental task while maintaining its generalizability remains an open question. In this work, we present an extensive analysis for continual learning on a pre-trained model (CLPM), and attribute the key challenge to a progressive overfitting problem. Observing that selectively reducing the learning rate can almost resolve this issue in the representation layer, we propose a simple but extremely effective approach named Slow Learner with Classifier Alignment (SLCA), which further improves the classification layer by modeling the class-wise distributions and aligning the classification layers in a post-hoc fashion. Across a variety of scenarios, our proposal provides substantial improvements for CLPM (e.g., up to 49.76%, 50.05%, 44.69% and 40.16% on Split CIFAR-100, Split ImageNet-R, Split CUB-200 and Split Cars-196, respectively), and thus outperforms state-of-the-art approaches by a large margin. Based on such a strong baseline, critical factors and promising directions are analyzed in-depth to facilitate subsequent research. Code has been made available at: https://github.com/GengDavid/SLCA.

We're not able to analyze this paper right now due to high demand.

Please check back later (sorry!).

Generate a detailed summary of this paper with a premium account.

We ran into a problem analyzing this paper.

Subscribe by Email

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.

References
  1. Memory aware synapses: Learning what (not) to forget. In Proceedings of the European Conference on Computer Vision, pages 139–154
  2. BEiT: BERT Pre-Training of Image Transformers
  3. Dark experience for general continual learning: a strong, simple baseline. In Advances in Neural Information Processing Systems, volume 33, pages 15920–15930
  4. An empirical study of training self-supervised vision transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 9640–9649
  5. Continual Pre-Training Mitigates Forgetting in Language and Vision
  6. Imagenet: A large-scale hierarchical image database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 248–255
  7. Learning without memorizing. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5138–5146
  8. The Turking Test: Can Language Models Understand Instructions?
  9. Self-supervised models are continual learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9621–9630
  10. Spottune: transfer learning through adaptive fine-tuning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4805–4814
  11. Remind your neural network to prevent catastrophic forgetting. In Proceedings of the European Conference on Computer Vision, pages 466–483
  12. Lifelong machine learning with deep streaming linear discriminant analysis. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pages 220–221
  13. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16000–16009
  14. Rethinking imagenet pre-training. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4918–4927
  15. The many faces of robustness: A critical analysis of out-of-distribution generalization. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 8340–8349
  16. Parameter-efficient transfer learning for nlp. In International Conference on Machine Learning, pages 2790–2799. PMLR
  17. How Well Does Self-Supervised Pre-Training Perform with Streaming Data?
  18. Continual Learning of Natural Language Processing Tasks: A Survey
  19. Overcoming catastrophic forgetting in neural networks. Proceedings of the National Academy of Sciences, 114(13):3521–3526
  20. Similarity of neural network representations revisited. In Proceedings of International Conference on Machine Learning, pages 3519–3529. PMLR
  21. 3d object representations for fine-grained categorization. In Proceedings of the IEEE International Conference on Computer Vision Workshops, pages 554–561
  22. Learning multiple layers of features from tiny images. Technical report
  23. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems, volume 25, pages 1097–1105
  24. The power of scale for parameter-efficient prompt tuning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 3045–3059
  25. Learning without forgetting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(12):2935–2947
  26. Representational Continuity for Unsupervised Continual Learning
  27. Why there are complementary learning systems in the hippocampus and neocortex: Insights from the successes and failures of connectionist models of learning and memory. Psychological Review, 102(3):419
  28. An Empirical Investigation of the Role of Pre-training in Lifelong Learning
  29. Xingchao Peng et al. Moment matching for multi-source domain adaptation. In ICCV
  30. Gdumb: A simple approach that questions our progress in continual learning. In Proceedings of European Conference on Computer Vision, pages 524–540
  31. Effect of scale on catastrophic forgetting in neural networks. In Proceedings of the International Conference on Learning Representations
  32. ImageNet-21K Pretraining for the Masses
  33. Progressive Neural Networks
  34. Overcoming catastrophic forgetting with hard attention to the task. In Proceedings of International Conference on Machine Learning, pages 4548–4557
  35. Incremental learning of object detectors without catastrophic forgetting. In Proceedings of the IEEE international conference on computer vision, pages 3400–3409
  36. Training data-efficient image transformers & distillation through attention. In Proceedings of International Conference on Machine Learning, pages 10347–10357. PMLR
  37. Three scenarios for continual learning
  38. The caltech-ucsd birds-200-2011 dataset. 2011.
  39. Triple-memory networks: A brain-inspired method for continual learning. IEEE Transactions on Neural Networks and Learning Systems
  40. Ordisco: Effective and efficient usage of incremental unlabeled data for semi-supervised continual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5383–5392
  41. Afec: Active forgetting of negative transfer in continual learning. In Advances in Neural Information Processing Systems, volume 34
  42. Coscl: Cooperation of small continual learners is stronger than a big one. In European Conference on Computer Vision, pages 254–271. Springer
  43. A Comprehensive Survey of Continual Learning: Theory, Method and Application
  44. Memory replay with data compression for continual learning. In Proceedings of the International Conference on Learning Representations
  45. DualPrompt: Complementary Prompting for Rehearsal-free Continual Learning
  46. Learning to prompt for continual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 139–149
  47. Mitigating Neural Network Overconfidence with Logit Normalization
  48. Class-incremental learning with strong pre-trained models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9601–9610
  49. Large scale incremental learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 374–382
  50. Continual object detection via prototypical task correlation guided gating mechanism. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9255–9264
  51. Continual learning through synaptic intelligence. In Proceedings of the International Conference on Machine Learning, pages 3987–3995
  52. Few-shot segmentation via cycle-consistent transformer. Advances in Neural Information Processing Systems, 34:21984–21996
  53. Revisiting Few-sample BERT Fine-tuning
  54. Mining unseen classes via regional objectness: A simple baseline for incremental segmentation. NeurIPS, 35
  55. Coinseg: Contrast inter- and intra- class representations for incremental segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision
  56. Prototype augmentation and self-supervision for incremental learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5871–5880
  57. Ctp: Towards vision-language continual pretraining via compatible momentum contrast and topology preservation

Show All 57