Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 154 tok/s
Gemini 2.5 Pro 40 tok/s Pro
GPT-5 Medium 25 tok/s Pro
GPT-5 High 21 tok/s Pro
GPT-4o 93 tok/s Pro
Kimi K2 170 tok/s Pro
GPT OSS 120B 411 tok/s Pro
Claude Sonnet 4.5 36 tok/s Pro
2000 character limit reached

Towards Understanding Neural Collapse: The Effects of Batch Normalization and Weight Decay (2309.04644v3)

Published 9 Sep 2023 in cs.LG

Abstract: Neural Collapse (NC) is a geometric structure recently observed at the terminal phase of training deep neural networks, which states that last-layer feature vectors for the same class would "collapse" to a single point, while features of different classes become equally separated. We demonstrate that batch normalization (BN) and weight decay (WD) critically influence the emergence of NC. In the near-optimal loss regime, we establish an asymptotic lower bound on the emergence of NC that depends only on the WD value, training loss, and the presence of last-layer BN. Our experiments substantiate theoretical insights by showing that models demonstrate a stronger presence of NC with BN, appropriate WD values, lower loss, and lower last-layer feature norm. Our findings offer a novel perspective in studying the role of BN and WD in shaping neural network features.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (22)
  1. Nearest class-center simplification through intermediate layers. In Proceedings of Topological, Algebraic, and Geometric Learning Workshops, volume 196 of PMLR, pages 37–47, 2022.
  2. Reverse engineering self-supervised learning, 2023.
  3. On the emergence of simplex symmetry in the final and penultimate layers of neural network classifiers. In Joan Bruna, Jan Hesthaven, and Lenka Zdeborova, editors, Proceedings of the 2nd Mathematical and Scientific Machine Learning Conference, volume 145 of Proceedings of Machine Learning Research, pages 270–290. PMLR, 16–19 Aug 2022. URL https://proceedings.mlr.press/v145/e22b.html.
  4. On the implicit bias towards minimal depth of deep neural networks, 2022a. URL https://arxiv.org/abs/2202.09028.
  5. On the role of neural collapse in transfer learning, 2022b.
  6. Neural collapse under mse loss: Proximity to and dynamics on the central path. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=w1UbdvWH_R3.
  7. Deep residual learning for image recognition, 2015.
  8. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International Conference on Machine Learning, pages 448–456, 2015.
  9. An unconstrained layer-peeled perspective on neural collapse, 2022.
  10. Alex Krizhevsky. Learning multiple layers of features from tiny images. Technical report, 2009.
  11. Mnist handwritten digit database. ATT Labs [Online]. Available: http://yann.lecun.com/exdb/mnist, 2, 2010.
  12. Neural collapse under cross-entropy loss. Applied and Computational Harmonic Analysis, 59:224–241, 2022. ISSN 1063-5203. doi: https://doi.org/10.1016/j.acha.2021.12.011. URL https://www.sciencedirect.com/science/article/pii/S1063520321001123. Special Issue on Harmonic Analysis and Machine Learning.
  13. Remarks on strongly convex functions. Aequationes mathematicae, 80(1):193–199, Sep 2010. ISSN 1420-8903. doi: 10.1007/s00010-010-0043-0. URL https://doi.org/10.1007/s00010-010-0043-0.
  14. Neural collapse with unconstrained features, 2020.
  15. Prevalence of neural collapse during the terminal phase of deep learning training. Proceedings of the National Academy of Sciences, 117(40):24652–24663, 2020. doi: 10.1073/pnas.2015509117. URL https://www.pnas.org/doi/abs/10.1073/pnas.2015509117.
  16. Explicit regularization and implicit bias in deep network classifiers trained with the square loss, 2020.
  17. Very deep convolutional networks for large-scale image recognition, 2015.
  18. Deep neural collapse is provably optimal for the deep unconstrained features model, 2023.
  19. Extended unconstrained features model for exploring deep neural collapse, 2022.
  20. Neural collapse with normalized features: A geometric analysis over the riemannian manifold. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors, Advances in Neural Information Processing Systems, volume 35, pages 11547–11560. Curran Associates, Inc., 2022. URL https://proceedings.neurips.cc/paper_files/paper/2022/file/4b3cc0d1c897ebcf71aca92a4a26ac83-Paper-Conference.pdf.
  21. On the optimization landscape of neural collapse under mse loss: Global optimality with unconstrained features. arXiv preprint arXiv:2203.01238, 2022.
  22. A geometric analysis of neural collapse with unconstrained features. 2021.
Citations (3)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Questions

We haven't generated a list of open questions mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (2)

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 1 tweet and received 2 likes.

Upgrade to Pro to view all of the tweets about this paper: