Robust Mean Estimation in High Dimensions: An Outlier Fraction Agnostic and Efficient Algorithm (2102.08573v5)
Abstract: The problem of robust mean estimation in high dimensions is studied, in which a certain fraction (less than half) of the datapoints can be arbitrarily corrupted. Motivated by compressive sensing, the robust mean estimation problem is formulated as the minimization of the $\ell_0$-norm' of an \emph{outlier indicator vector}, under a second moment constraint on the datapoints. The $\ell_0$-
norm' is then relaxed to the $\ell_p$-norm ($0<p\leq 1$) in the objective, and it is shown that the global minima for each of these objectives are order-optimal and have optimal breakdown point for the robust mean estimation problem. Furthermore, a computationally tractable iterative $\ell_p$-minimization and hard thresholding algorithm is proposed that outputs an order-optimal robust estimate of the population mean. The proposed algorithm (with breakdown point $\approx 0.3$) does not require prior knowledge of the fraction of outliers, in contrast with most existing algorithms, and for $p=1$ it has near-linear time complexity. Both synthetic and real data experiments demonstrate that the proposed algorithm outperforms state-of-the-art robust mean estimation methods.
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.