Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
129 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Fast Robust Kernel Regression through Sign Gradient Descent with Early Stopping (2306.16838v6)

Published 29 Jun 2023 in stat.ML, cs.LG, math.OC, and stat.ME

Abstract: Kernel ridge regression, KRR, is a generalization of linear ridge regression that is non-linear in the data, but linear in the model parameters. Here, we introduce an equivalent formulation of the objective function of KRR, which opens up both for replacing the ridge penalty with the $\ell_\infty$ and $\ell_1$ penalties and for studying kernel ridge regression from the perspective of gradient descent. Using the $\ell_\infty$ and $\ell_1$ penalties, we obtain robust and sparse kernel regression, respectively. We further study the similarities between explicitly regularized kernel regression and the solutions obtained by early stopping of iterative gradient-based methods, where we connect $\ell_\infty$ regularization to sign gradient descent, $\ell_1$ regularization to forward stagewise regression (also known as coordinate descent), and $\ell_2$ regularization to gradient descent, and, in the last case, theoretically bound for the differences. We exploit the close relations between $\ell_\infty$ regularization and sign gradient descent, and between $\ell_1$ regularization and coordinate descent to propose computationally efficient methods for robust and sparse kernel regression. We finally compare robust kernel regression through sign gradient descent to existing methods for robust kernel regression on five real data sets, demonstrating that our method is one to two orders of magnitude faster, without compromising accuracy.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (41)
  1. A continuous-time view of early stopping for least squares regression. In The 22nd International Conference on Artificial Intelligence and Statistics, pages 1370–1378. PMLR.
  2. Complete ensemble empirical mode decomposition hybridized with random forest and kernel ridge regression model for monthly rainfall forecasts. Journal of Hydrology, 584:124647.
  3. Elastic gradient descent, an iterative optimization method approximating the solution paths of the elastic net. Journal of Machine Learning Research, 24(277):1–53.
  4. Dissecting adam: The sign, magnitude and variance of stochastic gradients. In International Conference on Machine Learning, pages 404–413. PMLR.
  5. Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with oscar. Biometrics, 64(1):115–123.
  6. Optimizing etching process recipe based on kernel ridge regression. Journal of Manufacturing Processes, 61:454–460.
  7. Kernel ridge regression-based tv regularization for motion correction of dynamic mri. Signal Processing, 197:108559.
  8. Infinity-norm support vector machines against adversarial label contamination. In CEUR Workshop Proceedings, volume 1816, pages 106–115. CEUR-WS.
  9. Least angle regression. Annals of Statistics, 32(2):407–499.
  10. Well logging curve reconstruction based on kernel ridge regression. Arabian Journal of Geosciences, 14(16):1–10.
  11. Kernelized elastic net regularization: generalization bounds, and sparse recovery. Neural Computation, 28(3):525–562.
  12. Gradient directed regularization. Unpublished manuscript, http://www-stat.stanford.edu/~ jhf/ftp/pathlite.pdf.
  13. Kernel basis pursuit. In ECML, pages 146–157. Springer.
  14. Forward stagewise regression and the monotone lasso. Electronic Journal of Statistics, 1:1–29.
  15. Neural networks for machine learning lecture 6a overview of mini-batch gradient descent.
  16. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
  17. Krige, D. G. (1951). A statistical approach to some basic mine valuation problems on the witwatersrand. Journal of the Southern African Institute of Mining and Metallurgy, 52(6):119–139.
  18. Fingerprinting indoor positioning method based on kernel ridge regression with feature reduction. Wireless Communications and Mobile Computing, 2021.
  19. Diving into the shallows: a computational perspective on large-scale shallow learning. Advances in Neural Information Processing Systems, 30.
  20. Boosting algorithms as gradient descent. Advances in Neural Information Processing Systems, 12.
  21. Matheron, G. (1963). Principles of geostatistics. Economic Geology, 58(8):1246–1266.
  22. Mercer, J. (1909). Xvi. functions of positive and negative type, and their connection the theory of integral equations. Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character, 209(441-458):415–446.
  23. Nesterov, Y. (1983). A method for unconstrained convex minimization problem with the rate of convergence o (1/k^ 2). In Doklady an USSR, volume 269, pages 543–547.
  24. Sparse spatial autoregressions. Statistics & Probability Letters, 33(3):291–297.
  25. Polyak, B. T. (1964). Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics, 4(5):1–17.
  26. Early stopping and non-parametric regression: an optimal data-dependent stopping rule. Journal of Machine Learning Research, 15(1):335–366.
  27. Rockafellar, R. T. (1976). Monotone operators and the proximal point algorithm. SIAM Journal on Control and Optimization, 14(5):877–898.
  28. Roth, V. (2004). The generalized lasso. IEEE Transactions on Neural Networks, 15(1):16–28.
  29. Secure kernel machines against evasion attacks. In Proceedings of the 2016 ACM Workshop on Artificial Intelligence and Security, pages 59–69.
  30. Kernel ridge regression model for sediment transport in open channel flow. Neural Computing and Applications, 33(17):11255–11271.
  31. Experimental evaluation and development of predictive models for rheological behavior of aqueous fe3o4 ferrofluid in the presence of an external magnetic field by introducing a novel grid optimization based-kernel ridge regression supported by sensitivity analysis. Powder Technology, 393:1–11.
  32. Neural dynamics underlying birdsong practice and performance. Nature, 599(7886):635–639.
  33. Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58(1):267–288.
  34. Tibshirani, R. J. (2015). A general framework for fast stagewise algorithms. Journal of Machine Learning Research, 16(1):2543–2588.
  35. Generalized additive models for gigadata: Modeling the uk black smoke network daily data. Journal of the American Statistical Association, 112(519):1199–1210.
  36. Increasing efficiency of nonadiabatic molecular dynamics by hamiltonian interpolation with kernel ridge regression. The Journal of Physical Chemistry A, 125(41):9191–9200.
  37. On early stopping in gradient descent learning. Constructive Approximation, 26(2):289–315.
  38. Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 68(1):49–67.
  39. Prediction of higher-selectivity catalysts by computer-driven workflow and machine learning. Science, 363(6424):eaau5631.
  40. Exclusive lasso for multi-task feature selection. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pages 988–995.
  41. Zou, H. (2006). The adaptive lasso and its oracle properties. Journal of the American Statistical Association, 101(476):1418–1429.
Citations (2)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets