Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 37 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 10 tok/s Pro
GPT-5 High 15 tok/s Pro
GPT-4o 84 tok/s Pro
Kimi K2 198 tok/s Pro
GPT OSS 120B 448 tok/s Pro
Claude Sonnet 4 36 tok/s Pro
2000 character limit reached

Rotate your Networks: Better Weight Consolidation and Less Catastrophic Forgetting (1802.02950v4)

Published 8 Feb 2018 in cs.CV

Abstract: In this paper we propose an approach to avoiding catastrophic forgetting in sequential task learning scenarios. Our technique is based on a network reparameterization that approximately diagonalizes the Fisher Information Matrix of the network parameters. This reparameterization takes the form of a factorized rotation of parameter space which, when used in conjunction with Elastic Weight Consolidation (which assumes a diagonal Fisher Information Matrix), leads to significantly better performance on lifelong learning of sequential tasks. Experimental results on the MNIST, CIFAR-100, CUB-200 and Stanford-40 datasets demonstrate that we significantly improve the results of standard elastic weight consolidation, and that we obtain competitive results when compared to other state-of-the-art in lifelong learning without forgetting.

Citations (243)

Summary

  • The paper’s main contribution is a reparameterization method that rotates the parameter space to optimize weight consolidation in EWC.
  • The technique reduces catastrophic forgetting by better approximating a diagonal Fisher Information Matrix during sequential task learning.
  • Evaluations on MNIST, CIFAR-100, CUB-200, and Stanford-40 demonstrate its superior performance over standard EWC methods.

An Overview of "Rotate your Networks: Better Weight Consolidation and Less Catastrophic Forgetting"

The paper presents a novel approach to improve Elastic Weight Consolidation (EWC) for addressing catastrophic forgetting in sequential task learning of neural networks. Catastrophic forgetting occurs when a network forgets previous tasks as it learns new ones. EWC is a known method that adds a regularization term to prevent this, but it presumes a diagonal Fisher Information Matrix (FIM), which limits its effectiveness.

The authors propose a method that enhances EWC by approximately diagonalizing the FIM through a network reparameterization technique. This technique involves a factorized rotation of the parameter space, allowing for more effective weight consolidation without significant forgetting. The paper evaluates this approach against standard EWC on datasets such as MNIST, CIFAR-100, CUB-200, and Stanford-40, showing improved performance in lifelong learning scenarios.

Key Contributions

  1. Network Reparameterization: The core contribution is the proposed rotation of the parameter space, which is shown to steer EWC towards more optimal solutions with less forgetting. Unlike previously challenging methods like Singular Value Decomposition (SVD) for directly rotating FIM, this method introduces layer-wise rotations through additional fixed layers in the network.
  2. Practical Evaluation: The method was evaluated against conventional EWC on tasks split across several datasets. It outperformed EWC and showcased comparable or superior performance to other state-of-the-art algorithms for lifelong learning without requiring exemplars.
  3. Diagonal Assumption in EWC: By rotating the parameter space, the technique better satisfies the diagonal assumption inherent in EWC, addressing one of the primary drawbacks of EWC in practice when the real FIM is non-diagonal.

Implications and Future Directions

This work's implications are noteworthy for the domain of lifelong learning in neural networks. By enhancing EWC with a simple rotation method, it opens avenues for more efficient and less forgetful training architectures. The method scales through the extension to convolutional networks, making it applicable to various network architectures commonly used in computer vision tasks.

In future research, exploring this approach's limits, such as its applicability to larger, more complex datasets, or its integration with more recent alternates to EWC, could provide insights into building even more efficient lifelong learning models. Additionally, investigating the theoretical underpinnings of why such rotations specifically aid in lessening forgetting could be beneficial for designing new algorithms.

Ultimately, this work pushes forward the understanding of weight consolidation for neural network models, particularly in sequence-task learning, effectively mitigating the challenge of catastrophic forgetting by refining and optimizing existing methodologies.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Lightbulb On Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube