Understanding the PDHG Algorithm via High-Resolution Differential Equations (2403.11139v1)
Abstract: The least absolute shrinkage and selection operator (Lasso) is widely recognized across various fields of mathematics and engineering. Its variant, the generalized Lasso, finds extensive application in the fields of statistics, machine learning, image science, and related areas. Among the optimization techniques used to tackle this issue, saddle-point methods stand out, with the primal-dual hybrid gradient (PDHG) algorithm emerging as a particularly popular choice. However, the iterative behavior of PDHG remains poorly understood. In this paper, we employ dimensional analysis to derive a system of high-resolution ordinary differential equations (ODEs) tailored for PDHG. This system effectively captures a key feature of PDHG, the coupled $x$-correction and $y$-correction, distinguishing it from the proximal Arrow-Hurwicz algorithm. The small but essential perturbation ensures that PDHG consistently converges, bypassing the periodic behavior observed in the proximal Arrow-Hurwicz algorithm. Through Lyapunov analysis, We investigate the convergence behavior of the system of high-resolution ODEs and extend our insights to the discrete PDHG algorithm. Our analysis indicates that numerical errors resulting from the implicit scheme serve as a crucial factor affecting the convergence rate and monotonicity of PDHG, showcasing a noteworthy pattern also observed for the Alternating Direction Method of Multipliers (ADMM), as identified in [Li and Shi, 2024]. In addition, we further discover that when one component of the objective function is strongly convex, the iterative average of PDHG converges strongly at a rate $O(1/N)$, where $N$ is the number of iterations.
- Large deviations and gradient flows. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 371(2005):20120341, 2013.
- Studies in linear and non-linear programming. Stanford mathematical studies in the social sciences. Stanford University Pres, 1958.
- Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends® in Machine learning, 3(1):1–122, 2011.
- S. P. Boyd and L. Vandenberghe. Convex optimization. Cambridge University Press, 2004.
- From sparse solutions of systems of equations to sparse modeling of signals and images. SIAM review, 51(1):34–81, 2009.
- Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information. IEEE Transactions on information theory, 52(2):489–509, 2006.
- A. Chambolle and T. Pock. A first-order primal-dual algorithm for convex problems with applications to imaging. Journal of mathematical imaging and vision, 40:120–145, 2011.
- A. Chambolle and T. Pock. On the ergodic convergence rates of a first-order primal-dual algorithm. Mathematical Programming, 159(1):253–287, 2016a.
- A. Chambolle and T. Pock. An introduction to continuous optimization for imaging. Acta Numerica, 25:161–319, 2016b.
- Gradient norm minimization of Nesterov acceleration: o(1/k3)o1superscript𝑘3\mathrm{o}(1/k^{3})roman_o ( 1 / italic_k start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ). arXiv preprint arXiv:2209.08862, 2022a.
- Revisiting the acceleration phenomenon via high-resolution differential equations. arXiv preprint arXiv:2212.05700, 2022b.
- On underdamped Nesterov’s acceleration. arXiv preprint arXiv:2304.14642, 2023.
- D. L. Donoho. Compressed sensing. IEEE Transactions on information theory, 52(4):1289–1306, 2006.
- Adapting to unknown smoothness via wavelet shrinkage. Journal of the American statistical association, 90(432):1200–1224, 1995.
- A general framework for a class of first order primal-dual algorithms for convex optimization in imaging science. SIAM Journal on Imaging Sciences, 3(4):1015–1046, 2010.
- K. Feng. On difference schemes and symplectic geometry. In Proceedings of the 5th international symposium on differential geometry and differential equations, 1984.
- D. Gabay and B. Mercier. A dual algorithm for the solution of nonlinear variational problems via finite element approximation. Computers & mathematics with applications, 2(1):17–40, 1976.
- R. Glowinski and A. Marroco. On the approximation, by éfinite elements of order one, and the resolution, by pénalization-dualité of a class of nonlinear dirichlet problems. French journal of automation, computer science, operational research. Numerical analysis, 9(R2):41–76, 1975.
- Geometric numerical integration: Structure-Preserving Algorithms for Ordinary Differential Equations, volume 31 of Springer Series in Computational Mathematics. Springer, 2nd edition, 2006.
- B. He and X. Yuan. Convergence analysis of primal-dual algorithms for a saddle-point problem: from contraction perspective. SIAM Journal on Imaging Sciences, 5(1):119–149, 2012.
- On the convergence of primal-dual hybrid gradient algorithm. SIAM Journal on Imaging Sciences, 7(4):2526–2537, 2014.
- A generalized primal-dual algorithm with improved convergence condition for saddle point problems. SIAM Journal on Imaging Sciences, 15(3):1157–1183, 2022.
- ℓ1subscriptℓ1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT trend filtering. SIAM review, 51(2):339–360, 2009.
- G. M. Korpelevich. The extragradient method for finding saddle points and other problems. Matecon, 12:747–756, 1976.
- B. Li and B. Shi. Understanding the admm algorithm via high-resolution differential equations. arXiv preprint arXiv:2401.07096, 2024.
- Linear convergence of ISTA and FISTA. arXiv preprint arXiv:2212.06319, 2022a.
- Proximal subgradient norm minimization of ISTA and FISTA. arXiv preprint arXiv:2211.01610, 2022b.
- Linear convergence of Nesterov-1983 with the strong convexity. arXiv preprint arXiv:2306.09694, 2023.
- Problem complexity and method efficiency in optimization. Wiley-Interscience, 1983.
- Y. Nesterov. Introductory Lectures on Convex Optimization: A Basic Course, volume 87. Springer Science & Business Media, 1998.
- L. D. Popov. A modification of the Arrow-Hurwicz method for search of saddle points. Mathematical notes of the Academy of Sciences of the USSR, 28:845–848, 1980.
- R. T. Rockafellar. Convex Analysis, volume 18. Princeton University Press, 1970.
- Variational analysis, volume 317. Springer Science & Business Media, 2009.
- Nonlinear total variation based noise removal algorithms. Physica D: nonlinear phenomena, 60(1-4):259–268, 1992.
- B. Shi. On the hyperparameters in stochastic gradient descent with momentum. arXiv preprint arXiv:2108.03947, 2021.
- Acceleration via symplectic discretization of high-resolution differential equations. Advances in Neural Information Processing Systems, 32, 2019.
- Understanding the acceleration phenomenon via high-resolution differential equations. Mathematical Programming, 195(1-2):79–148, 2022.
- On learning rates and Schrödinger operators. Journal of Machine Learning Research, 24(379):1–53, 2023.
- R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B: Statistical Methodology, 58(1):267–288, 1996.
- Sparsity and smoothness via the fused lasso. Journal of the Royal Statistical Society Series B: Statistical Methodology, 67(1):91–108, 2005.
- R. J. Tibshirani. The solution path of the generalized lasso. PhD thesis, Stanford University, Palo Alto, CA, 2011. PhD thesis.
- M. Zhu and T. Chan. An efficient primal-dual hybrid gradient algorithm for total variation image restoration. Ucla Cam Report, 34(2), 2008.