Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 28 tok/s
Gemini 2.5 Pro 40 tok/s Pro
GPT-5 Medium 16 tok/s Pro
GPT-5 High 13 tok/s Pro
GPT-4o 103 tok/s Pro
Kimi K2 197 tok/s Pro
GPT OSS 120B 471 tok/s Pro
Claude Sonnet 4 38 tok/s Pro
2000 character limit reached

Empirical Risk Minimization for Stochastic Convex Optimization: $O(1/n)$- and $O(1/n^2)$-type of Risk Bounds (1702.02030v1)

Published 7 Feb 2017 in cs.LG

Abstract: Although there exist plentiful theories of empirical risk minimization (ERM) for supervised learning, current theoretical understandings of ERM for a related problem---stochastic convex optimization (SCO), are limited. In this work, we strengthen the realm of ERM for SCO by exploiting smoothness and strong convexity conditions to improve the risk bounds. First, we establish an $\widetilde{O}(d/n + \sqrt{F_/n})$ risk bound when the random function is nonnegative, convex and smooth, and the expected function is Lipschitz continuous, where $d$ is the dimensionality of the problem, $n$ is the number of samples, and $F_$ is the minimal risk. Thus, when $F_$ is small we obtain an $\widetilde{O}(d/n)$ risk bound, which is analogous to the $\widetilde{O}(1/n)$ optimistic rate of ERM for supervised learning. Second, if the objective function is also $\lambda$-strongly convex, we prove an $\widetilde{O}(d/n + \kappa F_/n )$ risk bound where $\kappa$ is the condition number, and improve it to $O(1/[\lambda n2] + \kappa F_/n)$ when $n=\widetilde{\Omega}(\kappa d)$. As a result, we obtain an $O(\kappa/n2)$ risk bound under the condition that $n$ is large and $F_$ is small, which to the best of our knowledge, is the first $O(1/n2)$-type of risk bound of ERM. Third, we stress that the above results are established in a unified framework, which allows us to derive new risk bounds under weaker conditions, e.g., without convexity of the random function and Lipschitz continuity of the expected function. Finally, we demonstrate that to achieve an $O(1/[\lambda n2] + \kappa F_*/n)$ risk bound for supervised learning, the $\widetilde{\Omega}(\kappa d)$ requirement on $n$ can be replaced with $\Omega(\kappa2)$, which is dimensionality-independent.

Citations (47)
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-Up Questions

We haven't generated follow-up questions for this paper yet.