Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 165 tok/s
Gemini 2.5 Pro 47 tok/s Pro
GPT-5 Medium 28 tok/s Pro
GPT-5 High 24 tok/s Pro
GPT-4o 112 tok/s Pro
Kimi K2 208 tok/s Pro
GPT OSS 120B 466 tok/s Pro
Claude Sonnet 4.5 36 tok/s Pro
2000 character limit reached

SGD and Weight Decay Secretly Minimize the Rank of Your Neural Network (2206.05794v7)

Published 12 Jun 2022 in cs.LG and stat.ML

Abstract: We investigate the inherent bias of Stochastic Gradient Descent (SGD) toward learning low-rank weight matrices during the training of deep neural networks. Our results demonstrate that training with mini-batch SGD and weight decay induces a bias toward rank minimization in the weight matrices. Specifically, we show both theoretically and empirically that this bias becomes more pronounced with smaller batch sizes, higher learning rates, or stronger weight decay. Additionally, we predict and empirically confirm that weight decay is essential for this bias to occur. Unlike previous literature, our analysis does not rely on assumptions about the data, convergence, or optimality of the weight matrices, making it applicable to a wide range of neural network architectures of any width or depth. Finally, we empirically explore the connection between this bias and generalization, finding that it has a marginal effect on the test performance.

Citations (5)

Summary

We haven't generated a summary for this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 1 tweet and received 2 likes.

Upgrade to Pro to view all of the tweets about this paper: