Escaping Saddle Points in Distributed Newton's Method with Communication Efficiency and Byzantine Resilience (2103.09424v2)

Published 17 Mar 2021 in cs.DC, cs.LG, math.OC, and stat.ML

Abstract: The problem of saddle-point avoidance for non-convex optimization is quite challenging in large scale distributed learning frameworks, such as Federated Learning, especially in the presence of Byzantine workers. The celebrated cubic-regularized Newton method of \cite{nest} is one of the most elegant ways to avoid saddle-points in the standard centralized (non-distributed) setup. In this paper, we extend the cubic-regularized Newton method to a distributed framework and simultaneously address several practical challenges like communication bottleneck and Byzantine attacks. Note that the issue of saddle-point avoidance becomes more crucial in the presence of Byzantine machines since rogue machines may create \emph{fake local minima} near the saddle-points of the loss function, also known as the saddle-point attack. Being a second order algorithm, our iteration complexity is much lower than the first order counterparts. Furthermore we use compression (or sparsification) techniques like $\delta$-approximate compression for communication efficiency. We obtain theoretical guarantees for our proposed scheme under several settings including approximate (sub-sampled) gradients and Hessians. Moreover, we validate our theoretical findings with experiments using standard datasets and several types of Byzantine attacks, and obtain an improvement of $25\%$ with respect to first order methods in iteration complexity.

Citations (5)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Escaping Saddle Points in Distributed Newton's Method with Communication Efficiency and Byzantine Resilience (2103.09424v2)

Summary

Related Papers