Statistical consistency and asymptotic normality for high-dimensional robust M-estimators (1501.00312v1)

Published 1 Jan 2015 in math.ST, cs.IT, math.IT, stat.ML, and stat.TH

Abstract: We study theoretical properties of regularized robust M-estimators, applicable when data are drawn from a sparse high-dimensional linear model and contaminated by heavy-tailed distributions and/or outliers in the additive errors and covariates. We first establish a form of local statistical consistency for the penalized regression estimators under fairly mild conditions on the error distribution: When the derivative of the loss function is bounded and satisfies a local restricted curvature condition, all stationary points within a constant radius of the true regression vector converge at the minimax rate enjoyed by the Lasso with sub-Gaussian errors. When an appropriate nonconvex regularizer is used in place of an l_1-penalty, we show that such stationary points are in fact unique and equal to the local oracle solution with the correct support---hence, results on asymptotic normality in the low-dimensional case carry over immediately to the high-dimensional setting. This has important implications for the efficiency of regularized nonconvex M-estimators when the errors are heavy-tailed. Our analysis of the local curvature of the loss function also has useful consequences for optimization when the robust regression function and/or regularizer is nonconvex and the objective function possesses stationary points outside the local region. We show that as long as a composite gradient descent algorithm is initialized within a constant radius of the true regression vector, successive iterates will converge at a linear rate to a stationary point within the local region. Furthermore, the global optimum of a convex regularized robust regression function may be used to obtain a suitable initialization. The result is a novel two-step procedure that uses a convex M-estimator to achieve consistency and a nonconvex M-estimator to increase efficiency.

Citations (187)

View on Semantic Scholar

Summary

The paper establishes statistical consistency of robust M-estimators in high-dimensional linear models contaminated by heavy-tailed errors and outliers.
The paper demonstrates that under local restricted curvature conditions, these estimators converge at rates comparable to Lasso with sub-Gaussian errors.
The paper introduces a two-step algorithm that combines convex initialization with nonconvex refinement to ensure efficient convergence and attain oracle properties.

Statistical Consistency and Asymptotic Normality for High-Dimensional Robust $M$ -Estimators

This paper authored by Po-Ling Loh meticulously examines the theoretical properties of regularized robust $M$ -estimators, particularly in the context of sparse high-dimensional linear models that may include data contaminated by heavy-tailed distributions and outliers. The paper evaluates the statistical consistency and asymptotic normality of these estimators, providing significant insights into their behavior under specific distributional assumptions.

The paper begins by defining the framework and interest for $M$ -estimators within the linear regression setting, focusing on robustness against deviations such as model misspecification and data outliers. This robustness is crucial in high-dimensional settings where traditional regression methods like ordinary least squares (OLS) may perform poorly when faced with anomalies or data with heavy tails. The document outlines the statistical consistency achieved by robust $M$ -estimators under select regularization techniques, with performance comparable to methods like the Lasso under sub-Gaussian errors.

Several key analytical findings are operationalized through the lens of local statistical consistency. Provided conditions on the derivative of the loss function and a local restricted curvature condition, all stationary points within a constant radius of the true regression vector converge at rates akin to the Lasso method when faced with sub-Gaussian errors. The inclusion of nonconvex regularizers can ensure the uniqueness of such stationary points, aligning them with the correct support.

The implication of these results is manifold. For one, robust $M$ -estimators demonstrate enhanced efficiency in the presence of heavy-tailed errors. The local curvature analysis of the proposed paper illuminates paths forward for optimizing when the regression function lacks convexity in local regions. Strategic initialization within these regions using composite gradient descent results in convergence to a stationary point at a linear rate.

A valuable two-step algorithm is proposed, employing a convex $M$ -estimator for initial consistency and a nonconvex variant for greater efficiency. The empirical and theoretical advantages of this approach are underlined by simulation results validating the practical applicability of these theoretical findings.

The paper's exploration of oracle properties introduces an intriguing insight into how nonconvex penalties enhance estimator performance, often equating to oracle solutions otherwise attained only with the knowledge of the true model support. This carries forward to discussions on asymptotic normality, broadening the understanding of regularized high-dimensional estimators' efficiency.

The theoretical breadth of this work suggests potential advancements in machine learning and statistical computing. Future research could extend beyond the provided framework, exploring optimal functional forms and distributions for robust estimators in ever-complicated high-dimensional spaces and real-world settings. Additionally, refining the algorithms presented in this paper for various robust losses could delineate more sophisticated techniques dealing with data peculiarity and stochastic models in practice.

Wrapping these discussions brings us back to the focus on achieving efficiency and reliability in the application of high-dimensional statistical methodologies—an endeavor critically relevant in analyzing complex datasets permeating today's data-centric inquiries.