Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 169 tok/s
Gemini 2.5 Pro 44 tok/s Pro
GPT-5 Medium 20 tok/s Pro
GPT-5 High 22 tok/s Pro
GPT-4o 87 tok/s Pro
Kimi K2 185 tok/s Pro
GPT OSS 120B 461 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

Projection-free Distributed Online Learning with Sublinear Communication Complexity (2103.11102v2)

Published 20 Mar 2021 in cs.LG and stat.ML

Abstract: To deal with complicated constraints via locally light computations in distributed online learning, a recent study has presented a projection-free algorithm called distributed online conditional gradient (D-OCG), and achieved an $O(T{3/4})$ regret bound for convex losses, where $T$ is the number of total rounds. However, it requires $T$ communication rounds, and cannot utilize the strong convexity of losses. In this paper, we propose an improved variant of D-OCG, namely D-BOCG, which can attain the same $O(T{3/4})$ regret bound with only $O(\sqrt{T})$ communication rounds for convex losses, and a better regret bound of $O(T{2/3}(\log T){1/3})$ with fewer $O(T{1/3}(\log T){2/3})$ communication rounds for strongly convex losses. The key idea is to adopt a delayed update mechanism that reduces the communication complexity, and redefine the surrogate loss function in D-OCG for exploiting the strong convexity. Furthermore, we provide lower bounds to demonstrate that the $O(\sqrt{T})$ communication rounds required by D-BOCG are optimal (in terms of $T$) for achieving the $O(T{3/4})$ regret with convex losses, and the $O(T{1/3}(\log T){2/3})$ communication rounds required by D-BOCG are near-optimal (in terms of $T$) for achieving the $O(T{2/3}(\log T){1/3})$ regret with strongly convex losses up to polylogarithmic factors. Finally, to handle the more challenging bandit setting, in which only the loss value is available, we incorporate the classical one-point gradient estimator into D-BOCG, and obtain similar theoretical guarantees.

Citations (12)

Summary

We haven't generated a summary for this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.