Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 167 tok/s
Gemini 2.5 Pro 48 tok/s Pro
GPT-5 Medium 33 tok/s Pro
GPT-5 High 40 tok/s Pro
GPT-4o 92 tok/s Pro
Kimi K2 193 tok/s Pro
GPT OSS 120B 425 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

Stochastic Weight Averaging Revisited (2201.00519v4)

Published 3 Jan 2022 in cs.LG

Abstract: Averaging neural network weights sampled by a backbone stochastic gradient descent (SGD) is a simple yet effective approach to assist the backbone SGD in finding better optima, in terms of generalization. From a statistical perspective, weight averaging (WA) contributes to variance reduction. Recently, a well-established stochastic weight averaging (SWA) method is proposed, which is featured by the application of a cyclical or high constant (CHC) learning rate schedule (LRS) in generating weight samples for WA. Then a new insight on WA appears, which states that WA helps to discover wider optima and then leads to better generalization. We conduct extensive experimental studies for SWA, involving a dozen modern DNN model structures and a dozen benchmark open-source image, graph, and text datasets. We disentangle contributions of the WA operation and the CHC LRS for SWA, showing that the WA operation in SWA still contributes to variance reduction but does not always lead to wide optima. The experimental results indicate that there are global scale geometric structures in the DNN loss landscape. We then present an algorithm termed periodic SWA (PSWA) which makes use of a series of WA operations to discover the global geometric structures. PSWA outperforms its backbone SGD remarkably, providing experimental evidences for the existence of global geometric structures. Codes for reproducing the experimental results are available at https://github.com/ZJLAB-AMMI/PSWA.

Citations (26)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (3)

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Github Logo Streamline Icon: https://streamlinehq.com