Emergent Mind

Improved error rates for sparse (group) learning with Lipschitz loss functions

Published Oct 20, 2019 in stat.ML , cs.LG , and stat.OT


We study a family of sparse estimators defined as minimizers of some empirical Lipschitz loss function -- which include the hinge loss, the logistic loss and the quantile regression loss -- with a convex, sparse or group-sparse regularization. In particular, we consider the L1 norm on the coefficients, its sorted Slope version, and the Group L1-L2 extension. We propose a new theoretical framework that uses common assumptions in the literature to simultaneously derive new high-dimensional L2 estimation upper bounds for all three regularization schemes. %, and to improve over existing results. For L1 and Slope regularizations, our bounds scale as $(k*/n) \log(p/k*)$ -- $n\times p$ is the size of the design matrix and $k*$ the dimension of the theoretical loss minimizer $\B{\beta}*$ -- and match the optimal minimax rate achieved for the least-squares case. For Group L1-L2 regularization, our bounds scale as $(s*/n) \log\left( G / s* \right) + m* / n$ -- $G$ is the total number of groups and $m*$ the number of coefficients in the $s*$ groups which contain $\B{\beta}*$ -- and improve over the least-squares case. We show that, when the signal is strongly group-sparse, Group L1-L2 is superior to L1 and Slope. In addition, we adapt our approach to the sub-Gaussian linear regression framework and reach the optimal minimax rate for Lasso, and an improved rate for Group-Lasso. Finally, we release an accelerated proximal algorithm that computes the nine main convex estimators of interest when the number of variables is of the order of $100,000s$.

