High-dimensional analysis of ridge regression for non-identically distributed data with a variance profile (2403.20200v2)
Abstract: High-dimensional linear regression has been thoroughly studied in the context of independent and identically distributed data. We propose to investigate high-dimensional regression models for independent but non-identically distributed data. To this end, we suppose that the set of observed predictors (or features) is a random matrix with a variance profile and with dimensions growing at a proportional rate. Assuming a random effect model, we study the predictive risk of the ridge estimator for linear regression with such a variance profile. In this setting, we provide deterministic equivalents of this risk and of the degree of freedom of the ridge estimator. For certain class of variance profile, our work highlights the emergence of the well-known double descent phenomenon in high-dimensional regression for the minimum norm least-squares estimator when the ridge regularization parameter goes to zero. We also exhibit variance profiles for which the shape of this predictive risk differs from double descent. The proofs of our results are based on tools from random matrix theory in the presence of a variance profile that have not been considered so far to study regression models. Numerical experiments are provided to show the accuracy of the aforementioned deterministic equivalents on the computation of the predictive risk of ridge regression. We also investigate the similarities and differences that exist with the standard setting of independent and identically distributed data.
- Freeness over the diagonal for large random matrices. The Annals of Probability, 49(1):157 – 179, 2021.
- Universality for general Wigner-type matrices. Probability Theory and Related Fields, 169(3):667–727, 2017.
- Local law for random Gram matrices. Electron. J. Probab., 22:41 pp., 2017.
- Local inhomogeneous circular law. Ann. Appl. Probab., 28(1):148–203, 02 2018.
- Stability of the matrix Dyson equation and random matrices with correlations. Probability Theory and Related Fields, 173(1):293–373, 2019.
- Predicting the stability of large structured food webs. Nature communications, 6(1):7842, 2015.
- Eigenvalues of block structured asymmetric random matrices. Journal of Mathematical Physics, 56(10), 2015.
- Transition to chaos in random networks with cell-type-specific connectivity. Physical review letters, 114(8):088101, 2015.
- Stefano Allesina and Si Tang. The stability–complexity relationship at age 40: a random matrix perspective. Population Ecology, 57(1):63–75, 2015.
- Francis Bach. High-dimensional analysis of double descent for linear regression with random projections. SIAM Journal on Mathematics of Data Science, 6(1):26–50, 2024.
- Models as Approximations II: A Model-Free Theory of Parametric Regression. Statistical Science, 34(4):545 – 565, 2019.
- Generalized sure for optimal shrinkage of singular values in low-rank matrix denoising. Journal of Machine Learning Research, 18(137):1–50, 2017.
- Rudolf Beran. Robust Estimation in Models for Independent Non-Identically Distributed Data. The Annals of Statistics, 10(2):415 – 428, 1982.
- Reconciling modern machine-learning practice and the classical bias–variance trade-off. Proceedings of the National Academy of Sciences, 116(32):15849–15854, 2019.
- Two models of double descent for weak features. SIAM Journal on Mathematics of Data Science, 2(4):1167–1180, 2020.
- Freeness over the diagonal and outliers detection in deformed random matrices with a variance profile. Information and Inference: A Journal of the IMA, 10(3):863–919, 07 2020.
- Inference of Poisson count processes using low-rank tensor data, pages 5989–5993. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 10 2013.
- Spectral analysis of large dimensional random matrices. Springer Series in Statistics. Springer, New York, second edition, 2010.
- Random Matrix Methods for Wireless Communications. Cambridge University Press, 2011.
- Non-hermitian random matrices with a variance profile (i): deterministic equivalents and limiting esds. Electron. J. Probab., 23:61 pp., 2018.
- Multisample estimation of bacterial composition matrices in metagenomics data. Biometrika, 107(1):75–92, 12 2019.
- Lee H. Dicker. Ridge regression and asymptotic minimax estimation over spheres of growing dimension. Bernoulli, 22(1):1 – 37, 2016.
- High-dimensional asymptotics of prediction: Ridge regression and classification. The Annals of Statistics, 46(1):247 – 279, 2018.
- Bradley Efron. The estimation of prediction error. Journal of the American Statistical Association, 99(467):619–632, 2004.
- Noureddine El Karoui. On the impact of predictor geometry on the performance on high-dimensional ridge-regularized generalized robust regression estimators. Probability Theory and Related Fields, 170:95–175, 02 2018.
- Bulk universality for generalized Wigner matrices. Probability Theory and Related Fields, 154(1-2):341–407, 2012.
- The empirical distribution of the eigenvalues of a gram matrix with a given variance profile. Annales de l’Institut Henri Poincare (B) Probability and Statistics, 42(6):649 – 670, 2006.
- Deterministic equivalents for certain functionals of large random matrices. The Annals of Applied Probability, 17(3):875 – 930, 2007.
- Surprises in high-dimensional ridgeless least squares interpolation. The Annals of Statistics, 50(2):949 – 986, 2022.
- Introduction to spectral theory: With applications to Schrödinger operators, volume 113. Springer Science & Business Media, 2012.
- Statistical Learning with Sparsity: The Lasso and Generalizations. Chapman & Hall/CRC, 2015.
- Xiao Han Jianqing Fan, Yingying Fan and Jinchi Lv. Asymptotic theory of eigenvectors for random matrices with diverging spikes. Journal of the American Statistical Association, 117(538):996–1009, 2022.
- Valid post-selection inference in model-free linear regression. The Annals of Statistics, 48(5):2953 – 2981, 2020.
- The dynamics of learning: A random matrix approach. In International Conference on Machine Learning, pages 3072–3081. PMLR, 2018.
- e𝑒eitalic_epca: High dimensional exponential family pca. Ann. Appl. Stat., 12(4):2121–2150, 12 2018.
- Linear regression on manifold structured data: the impact of extrinsic geometry on solutions, 2023.
- Camille Male. Traffic distributions and independence: permutation invariant random matrices and the three notions of independence. Mem. Amer. Math. Soc., 267(1300):v+88, 2020.
- High-dimensional linear models: A random matrix perspective. Sankhya A: The Indian Journal of Statistics, 83(2):645–695, 2021.
- Asymptotics of ridge(less) regression under general source condition. In Arindam Banerjee and Kenji Fukumizu, editors, The 24th International Conference on Artificial Intelligence and Statistics, AISTATS 2021, April 13-15, 2021, Virtual Event, volume 130 of Proceedings of Machine Learning Research, pages 3889–3897. PMLR, 2021.
- Walter Rudin. Real and complex analysis. McGraw-Hill Book Co., New York, third edition, 1987.
- Poisson noise reduction with non-local PCA. Journal of Mathematical Imaging and Vision, 48(2):279–294, 2014.
- Dimitri Shlyakhtenko. Random gaussian band matrices and freeness with amalgamation. International Mathematics Research Notices, 1996(20):1013–1025, 1996.
- Double descent demystified: Identifying, interpreting & ablating the sources of a deep learning puzzle. arXiv preprint arXiv:2303.14151, 2023.
- Random matrix theory and wireless communications. Found. Trends Commun. Inf. Theory, 1(1), 2004.
- Generalized low rank models. Foundations and Trends in Machine Learning, 9(1):1–118, 2016.
- Free random variables, volume 1 of CRM Monograph Series. American Mathematical Society, Providence, RI, 1992. A noncommutative probability approach to free products with applications to random matrices, operator algebras and harmonic analysis on free groups.
- Heteroskedastic PCA: Algorithm, optimality, and applications. The Annals of Statistics, 50(1):53 – 80, 2022.