Minimax Bounds for Distributed Logistic Regression (1910.01625v1)
Abstract: We consider a distributed logistic regression problem where labeled data pairs $(X_i,Y_i)\in \mathbb{R}d\times{-1,1}$ for $i=1,\ldots,n$ are distributed across multiple machines in a network and must be communicated to a centralized estimator using at most $k$ bits per labeled pair. We assume that the data $X_i$ come independently from some distribution $P_X$, and that the distribution of $Y_i$ conditioned on $X_i$ follows a logistic model with some parameter $\theta\in\mathbb{R}d$. By using a Fisher information argument, we give minimax lower bounds for estimating $\theta$ under different assumptions on the tail of the distribution $P_X$. We consider both $\ell2$ and logistic losses, and show that for the logistic loss our sub-Gaussian lower bound is order-optimal and cannot be improved.
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.