Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Understanding the ADMM Algorithm via High-Resolution Differential Equations (2401.07096v1)

Published 13 Jan 2024 in math.OC, cs.NA, and math.NA

Abstract: In the fields of statistics, machine learning, image science, and related areas, there is an increasing demand for decentralized collection or storage of large-scale datasets, as well as distributed solution methods. To tackle this challenge, the alternating direction method of multipliers (ADMM) has emerged as a widely used approach, particularly well-suited to distributed convex optimization. However, the iterative behavior of ADMM has not been well understood. In this paper, we employ dimensional analysis to derive a system of high-resolution ordinary differential equations (ODEs) for ADMM. This system captures an important characteristic of ADMM, called the $\lambda$-correction, which causes the trajectory of ADMM to deviate from the constrained hyperplane. To explore the convergence behavior of the system of high-resolution ODEs, we utilize Lyapunov analysis and extend our findings to the discrete ADMM algorithm. Through this analysis, we identify that the numerical error resulting from the implicit scheme is a crucial factor that affects the convergence rate and monotonicity in the discrete ADMM algorithm. In addition, we further discover that if one component of the objective function is assumed to be strongly convex, the iterative average of ADMM converges strongly with a rate $O(1/N)$, where $N$ is the number of iterations.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (36)
  1. Large deviations and gradient flows. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 371(20120341), 2013.
  2. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends® in Machine learning, 3(1):1–122, 2011.
  3. S. P. Boyd and L. Vandenberghe. Convex optimization. Cambridge University Press, 2004.
  4. From sparse solutions of systems of equations to sparse modeling of signals and images. SIAM review, 51(1):34–81, 2009.
  5. A. Chambolle and T. Pock. A first-order primal-dual algorithm for convex problems with applications to imaging. Journal of mathematical imaging and vision, 40:120–145, 2011.
  6. A. Chambolle and T. Pock. An introduction to continuous optimization for imaging. Acta Numerica, 25:161–319, 2016.
  7. Gradient norm minimization of Nesterov acceleration: o⁢(1/k3)o1superscript𝑘3\mathrm{o}(1/k^{3})roman_o ( 1 / italic_k start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ). arXiv preprint arXiv:2209.08862, 2022a.
  8. Revisiting the acceleration phenomenon via high-resolution differential equations. arXiv preprint arXiv:2212.05700, 2022b.
  9. On underdamped Nesterov’s acceleration. arXiv preprint arXiv:2304.14642, 2023.
  10. An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Communications on Pure and Applied Mathematics: A Journal Issued by the Courant Institute of Mathematical Sciences, 57(11):1413–1457, 2004.
  11. ADMM and accelerated ADMM as continuous dynamical systems. In International Conference on Machine Learning, pages 1559–1567. PMLR, 2018.
  12. D. Gabay and B. Mercier. A dual algorithm for the solution of nonlinear variational problems via finite element approximation. Computers & mathematics with applications, 2(1):17–40, 1976.
  13. R. Glowinski and A. Marroco. Sur l’approximation, par éléments finis d’ordre un, et la résolution, par pénalisation-dualité d’une classe de problèmes de dirichlet non linéaires. Revue française d’automatique, informatique, recherche opérationnelle. Analyse numérique, 9(R2):41–76, 1975.
  14. The elements of statistical learning: data mining, inference, and prediction, volume 2. Springer, 2009.
  15. B. He and X. Yuan. On the O⁢(1/n)O1𝑛\mathrm{O}(1/n)roman_O ( 1 / italic_n ) convergence rate of the Douglas–Rachford alternating direction method. SIAM Journal on Numerical Analysis, 50(2):700–709, 2012a.
  16. B. He and X. Yuan. Convergence analysis of primal-dual algorithms for a saddle-point problem: from contraction perspective. SIAM Journal on Imaging Sciences, 5(1):119–149, 2012b.
  17. B. He and X. Yuan. On non-ergodic convergence rate of Douglas–Rachford alternating direction method of multipliers. Numerische Mathematik, 130(3):567–577, 2015.
  18. ℓ1subscriptℓ1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT trend filtering. SIAM review, 51(2):339–360, 2009.
  19. Linear convergence of ISTA and FISTA. arXiv preprint arXiv:2212.06319, 2022a.
  20. Proximal subgradient norm minimization of ISTA and FISTA. arXiv preprint arXiv:2211.01610, 2022b.
  21. Linear convergence of Nesterov-1983 with the strong convexity. arXiv preprint arXiv:2306.09694, 2023.
  22. Problem Complexity and Method Efficiency in Optimization. Wiley-Interscience,NewYork, 1983.
  23. Y. Nesterov. Introductory Lectures on Convex Optimization: A Basic Course, volume 87. Springer Science & Business Media, 1998.
  24. Y. E. Nesterov. A method of solving a convex programming problem with convergence rate o⁢(1/k2)𝑜1superscript𝑘2o\bigl{(}1/k^{2}\bigr{)}italic_o ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Doklady Akademii Nauk, 269(3):543–547, 1983.
  25. T. Pock and A. Chambolle. Diagonal preconditioning for first order primal-dual algorithms in convex optimization. In 2011 International Conference on Computer Vision, pages 1762–1769. IEEE, 2011.
  26. R. T. Rockafellar. Convex Analysis, volume 18. Princeton University Press, 1970.
  27. Variational analysis, volume 317. Springer Science & Business Media, 2009.
  28. Nonlinear total variation based noise removal algorithms. Physica D: nonlinear phenomena, 60(1-4):259–268, 1992.
  29. B. Shi. On the hyperparameters in stochastic gradient descent with momentum. arXiv preprint arXiv:2108.03947, 2021.
  30. Acceleration via symplectic discretization of high-resolution differential equations. Advances in Neural Information Processing Systems, 32, 2019.
  31. On learning rates and schr\\\backslash\” odinger operators. arXiv preprint arXiv:2004.06977, 2020.
  32. Understanding the acceleration phenomenon via high-resolution differential equations. Mathematical Programming, 195(1-2):79–148, 2022.
  33. A differential equation for modeling Nesterov’s accelerated gradient method: Theory and insights. Journal of Machine Learning Research, 17(153):5312–5354, 2016.
  34. R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B: Statistical Methodology, 58(1):267–288, 1996.
  35. A Lyapunov analysis of accelerated methods in optimization. Journal of Machine Learning Research, 22(113):5040–5073, 2021.
  36. J. M. Wooldridge. Introductory Econometrics: A Modern Approach. South-Western, Cengage Learning, fifth edition, 2012.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Bowen Li (166 papers)
  2. Bin Shi (38 papers)
Citations (2)

Summary

We haven't generated a summary for this paper yet.