Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 77 tok/s
Gemini 2.5 Pro 45 tok/s Pro
GPT-5 Medium 30 tok/s Pro
GPT-5 High 28 tok/s Pro
GPT-4o 122 tok/s Pro
Kimi K2 178 tok/s Pro
GPT OSS 120B 450 tok/s Pro
Claude Sonnet 4.5 34 tok/s Pro
2000 character limit reached

Improving the convergence of SGD through adaptive batch sizes (1910.08222v4)

Published 18 Oct 2019 in cs.LG, math.OC, and stat.ML

Abstract: Mini-batch stochastic gradient descent (SGD) and variants thereof approximate the objective function's gradient with a small number of training examples, aka the batch size. Small batch sizes require little computation for each model update but can yield high-variance gradient estimates, which poses some challenges for optimization. Conversely, large batches require more computation but can yield higher precision gradient estimates. This work presents a method to adapt the batch size to the model's training loss. For various function classes, we show that our method requires the same order of model updates as gradient descent while requiring the same order of gradient computations as SGD. This method requires evaluating the model's loss on the entire dataset every model update. However, the required computation is greatly reduced by approximating the training loss. We provide experiments that illustrate our methods require fewer model updates without increasing the total amount of computation.

Citations (6)

Summary

We haven't generated a summary for this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.