Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 56 tok/s
Gemini 2.5 Pro 39 tok/s Pro
GPT-5 Medium 15 tok/s Pro
GPT-5 High 16 tok/s Pro
GPT-4o 99 tok/s Pro
Kimi K2 155 tok/s Pro
GPT OSS 120B 476 tok/s Pro
Claude Sonnet 4 38 tok/s Pro
2000 character limit reached

Towards Understanding Neural Collapse: The Effects of Batch Normalization and Weight Decay (2309.04644v3)

Published 9 Sep 2023 in cs.LG

Abstract: Neural Collapse (NC) is a geometric structure recently observed at the terminal phase of training deep neural networks, which states that last-layer feature vectors for the same class would "collapse" to a single point, while features of different classes become equally separated. We demonstrate that batch normalization (BN) and weight decay (WD) critically influence the emergence of NC. In the near-optimal loss regime, we establish an asymptotic lower bound on the emergence of NC that depends only on the WD value, training loss, and the presence of last-layer BN. Our experiments substantiate theoretical insights by showing that models demonstrate a stronger presence of NC with BN, appropriate WD values, lower loss, and lower last-layer feature norm. Our findings offer a novel perspective in studying the role of BN and WD in shaping neural network features.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (22)
  1. Nearest class-center simplification through intermediate layers. In Proceedings of Topological, Algebraic, and Geometric Learning Workshops, volume 196 of PMLR, pages 37–47, 2022.
  2. Reverse engineering self-supervised learning, 2023.
  3. On the emergence of simplex symmetry in the final and penultimate layers of neural network classifiers. In Joan Bruna, Jan Hesthaven, and Lenka Zdeborova, editors, Proceedings of the 2nd Mathematical and Scientific Machine Learning Conference, volume 145 of Proceedings of Machine Learning Research, pages 270–290. PMLR, 16–19 Aug 2022. URL https://proceedings.mlr.press/v145/e22b.html.
  4. On the implicit bias towards minimal depth of deep neural networks, 2022a. URL https://arxiv.org/abs/2202.09028.
  5. On the role of neural collapse in transfer learning, 2022b.
  6. Neural collapse under mse loss: Proximity to and dynamics on the central path. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=w1UbdvWH_R3.
  7. Deep residual learning for image recognition, 2015.
  8. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International Conference on Machine Learning, pages 448–456, 2015.
  9. An unconstrained layer-peeled perspective on neural collapse, 2022.
  10. Alex Krizhevsky. Learning multiple layers of features from tiny images. Technical report, 2009.
  11. Mnist handwritten digit database. ATT Labs [Online]. Available: http://yann.lecun.com/exdb/mnist, 2, 2010.
  12. Neural collapse under cross-entropy loss. Applied and Computational Harmonic Analysis, 59:224–241, 2022. ISSN 1063-5203. doi: https://doi.org/10.1016/j.acha.2021.12.011. URL https://www.sciencedirect.com/science/article/pii/S1063520321001123. Special Issue on Harmonic Analysis and Machine Learning.
  13. Remarks on strongly convex functions. Aequationes mathematicae, 80(1):193–199, Sep 2010. ISSN 1420-8903. doi: 10.1007/s00010-010-0043-0. URL https://doi.org/10.1007/s00010-010-0043-0.
  14. Neural collapse with unconstrained features, 2020.
  15. Prevalence of neural collapse during the terminal phase of deep learning training. Proceedings of the National Academy of Sciences, 117(40):24652–24663, 2020. doi: 10.1073/pnas.2015509117. URL https://www.pnas.org/doi/abs/10.1073/pnas.2015509117.
  16. Explicit regularization and implicit bias in deep network classifiers trained with the square loss, 2020.
  17. Very deep convolutional networks for large-scale image recognition, 2015.
  18. Deep neural collapse is provably optimal for the deep unconstrained features model, 2023.
  19. Extended unconstrained features model for exploring deep neural collapse, 2022.
  20. Neural collapse with normalized features: A geometric analysis over the riemannian manifold. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors, Advances in Neural Information Processing Systems, volume 35, pages 11547–11560. Curran Associates, Inc., 2022. URL https://proceedings.neurips.cc/paper_files/paper/2022/file/4b3cc0d1c897ebcf71aca92a4a26ac83-Paper-Conference.pdf.
  21. On the optimization landscape of neural collapse under mse loss: Global optimality with unconstrained features. arXiv preprint arXiv:2203.01238, 2022.
  22. A geometric analysis of neural collapse with unconstrained features. 2021.
Citations (3)
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-Up Questions

We haven't generated follow-up questions for this paper yet.

Authors (2)

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube