Emergent Mind

Continual Learning Beyond a Single Model

(2202.09826)
Published Feb 20, 2022 in cs.LG and cs.AI

Abstract

A growing body of research in continual learning focuses on the catastrophic forgetting problem. While many attempts have been made to alleviate this problem, the majority of the methods assume a single model in the continual learning setup. In this work, we question this assumption and show that employing ensemble models can be a simple yet effective method to improve continual performance. However, ensembles' training and inference costs can increase significantly as the number of models grows. Motivated by this limitation, we study different ensemble models to understand their benefits and drawbacks in continual learning scenarios. Finally, to overcome the high compute cost of ensembles, we leverage recent advances in neural network subspace to propose a computationally cheap algorithm with similar runtime to a single model yet enjoying the performance benefits of ensembles.

We're not able to analyze this paper right now due to high demand.

Please check back later (sorry!).

Generate a detailed summary of this paper with a premium account.

We ran into a problem analyzing this paper.

Subscribe by Email

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.

References
  1. Expert gate: Lifelong learning with a network of experts. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.  7120–7129
  2. Generalisation Guarantees for Continual Learning with Orthogonal Gradient Descent
  3. Loss surface simplexes for mode connecting volumes and fast ensembling. In Marina Meila and Tong Zhang (eds.), Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event, volume 139 of Proceedings of Machine Learning Research, pp.  769–779. PMLR
  4. Leo Breiman. Bagging predictors. Machine learning, 24(2):123–140
  5. On anytime learning at macroscale. In Sarath Chandar, Razvan Pascanu, and Doina Precup (eds.), Proceedings of The 1st Conference on Lifelong Learning Agents, volume 199 of Proceedings of Machine Learning Research, pp.  165–182. PMLR, 22–24 Aug 2022. https://proceedings.mlr.press/v199/caccia22a.html.

  6. Rose: Robust online self-adjusting ensemble for continual learning on imbalanced drifting data streams. Machine Learning, 04 2022. doi: 10.1007/s10994-022-06168-x.
  7. Ensemble selection from libraries of models. In Proceedings of the twenty-first international conference on Machine learning, pp.  18
  8. Efficient lifelong learning with a-GEM. In International Conference on Learning Representations, 2019a.
  9. On Tiny Episodic Memories in Continual Learning
  10. Thomas G Dietterich. Ensemble methods in machine learning. In International workshop on multiple classifier systems, pp. 1–15. Springer
  11. A theoretical analysis of catastrophic forgetting through the ntk overlap matrix. In International Conference on Artificial Intelligence and Statistics, pp.  1072–1080. PMLR
  12. Essentially no barriers in neural network energy landscape. In International Conference on Machine Learning, pp. 1308–1317
  13. Orthogonal gradient descent for continual learning. In International Conference on Artificial Intelligence and Statistics, pp.  3762–3773. PMLR
  14. PathNet: Evolution Channels Gradient Descent in Super Neural Networks
  15. Deep Ensembles: A Loss Landscape Perspective
  16. Loss surfaces, mode connectivity, and fast ensembling of dnns. In Advances in Neural Information Processing Systems, volume 31
  17. Learning a subspace of policies for online adaptation in Reinforcement Learning
  18. An Empirical Investigation of Catastrophic Forgetting in Gradient-Based Neural Networks
  19. Explaining and Harnessing Adversarial Examples
  20. Training independent subnetworks for robust prediction
  21. Snapshot Ensembles: Train 1, get M for free
  22. Jared Kaplan. Notes on contemporary machine learning for physicists. In ”
  23. Task-agnostic continual learning with hybrid probabilistic models. In ICML Workshop on Invertible Neural Networks, Normalizing Flows, and Explicit Likelihood Models
  24. Overcoming catastrophic forgetting in neural networks. Proceedings of the National Academy of Sciences, 114(13):3521–3526, Mar 2017. ISSN 1091-6490. doi: 10.1073/pnas.1611835114. http://dx.doi.org/10.1073/pnas.1611835114.

  25. Alex Krizhevsky et al. Learning multiple layers of features from tiny images. CoRR
  26. Simple and scalable predictive uncertainty estimation using deep ensembles. Advances in Neural Information Processing Systems, 30
  27. Continual Learning: Tackling Catastrophic Forgetting in Deep Neural Networks with Replay Processes
  28. Continual Learning in Deep Networks: an Analysis of the Last Layer
  29. Continual learning of new diseases with dual distillation and ensemble strategy. In Anne L. Martel, Purang Abolmaesumi, Danail Stoyanov, Diana Mateus, Maria A. Zuluaga, S. Kevin Zhou, Daniel Racoceanu, and Leo Joskowicz (eds.), Medical Image Computing and Computer Assisted Intervention – MICCAI 2020, pp.  169–178, Cham, 2020. Springer International Publishing. ISBN 978-3-030-59710-8.
  30. Habitat: A Platform for Embodied AI Research. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)
  31. Catastrophic interference in connectionist networks: The sequential learning problem. In Psychology of learning and motivation, volume 24, pp. 109–165. Elsevier
  32. Dropout as an implicit gating mechanism for continual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp.  232–233, 2020a.
  33. Understanding the role of training regimes in continual learning. In H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin (eds.), Advances in Neural Information Processing Systems, volume 33, pp.  7308–7320, 2020b.
  34. Linear mode connectivity in multitask and continual learning. In International Conference on Learning Representations, 2021. https://openreview.net/forum?id=Fmg_fQYUejf.

  35. Variational continual learning. In International Conference on Learning Representations
  36. icarl: Incremental classifier and representation learning. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), volume 2017, pp.  5533–5542
  37. Learning to learn without forgetting by maximizing transfer and minimizing interference. In International Conference on Learning Representations
  38. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV), 115(3):211–252, 2015. doi: 10.1007/s11263-015-0816-y.
  39. Gradient projection memory for continual learning. In International Conference on Learning Representations, 2021. https://openreview.net/forum?id=3AOj0RCNC2.

  40. Continual learning with deep generative replay. In Advances in Neural Information Processing Systems, pp. 2990–2999
  41. Habitat 2.0: Training Home Assistants to Rearrange their Habitat
  42. S. Thrun. A lifelong learning perspective for mobile robot control. In Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS’94), volume 1, pp.  23–30 vol.1, 1994. doi: 10.1109/IROS.1994.407413.
  43. Functional regularisation for continual learning with gaussian processes. In ICLR 2020 : Eighth International Conference on Learning Representations
  44. Coscl: Cooperation of small continual learners is stronger than a big one. In Shai Avidan, Gabriel J. Brostow, Moustapha Cissé, Giovanni Maria Farinella, and Tal Hassner (eds.), Computer Vision - ECCV 2022 - 17th European Conference, Tel Aviv, Israel, October 23-27, 2022, Proceedings, Part XXVI, volume 13686 of Lecture Notes in Computer Science, pp.  254–271. Springer, 2022. doi: 10.1007/978-3-031-19809-015. https://doi.org/10.1007/978-3-031-19809-015.

  45. Batchensemble: an alternative approach to efficient ensemble and lifelong learning. In International Conference on Learning Representations, 2020. https://openreview.net/forum?id=Sklf1yrYDr.

  46. Supermasks in superposition. In Advances in Neural Information Processing Systems, volume 33, pp.  15173–15184
  47. Learning neural network subspaces. In International Conference on Machine Learning
  48. Optimization and Generalization of Regularization-Based Continual Learning: a Loss Approximation Viewpoint
  49. Continual learning through synaptic intelligence

Show All 49