Developing Lagrangian-based Methods for Nonsmooth Nonconvex Optimization (2404.09438v1)
Abstract: In this paper, we consider the minimization of a nonsmooth nonconvex objective function $f(x)$ over a closed convex subset $\mathcal{X}$ of $\mathbb{R}n$, with additional nonsmooth nonconvex constraints $c(x) = 0$. We develop a unified framework for developing Lagrangian-based methods, which takes a single-step update to the primal variables by some subgradient methods in each iteration. These subgradient methods are ``embedded'' into our framework, in the sense that they are incorporated as black-box updates to the primal variables. We prove that our proposed framework inherits the global convergence guarantees from these embedded subgradient methods under mild conditions. In addition, we show that our framework can be extended to solve constrained optimization problems with expectation constraints. Based on the proposed framework, we show that a wide range of existing stochastic subgradient methods, including the proximal SGD, proximal momentum SGD, and proximal ADAM, can be embedded into Lagrangian-based methods. Preliminary numerical experiments on deep learning tasks illustrate that our proposed framework yields efficient variants of Lagrangian-based methods with convergence guarantees for nonconvex nonsmooth constrained optimization problems.
- Complexity of single loop algorithms for nonlinear programming with stochastic objective and constraints. arXiv preprint arXiv:2311.00678, 2023.
- On augmented Lagrangian methods with general lower-level constraints. SIAM Journal on Optimization, 18(4):1286–1309, 2008.
- Differential inclusions: set-valued maps and viability theory, volume 264. Springer Science & Business Media, 2012.
- Learning sparse deep neural networks using efficient structured projections on convex constraints for green AI. In 2020 25th international conference on pattern recognition (ICPR), pages 1566–1573. IEEE, 2021.
- Amir Beck. Introduction to nonlinear optimization: Theory, algorithms, and applications with MATLAB. SIAM, 2014.
- Michel Benaïm. Dynamics of stochastic approximation algorithms. In Seminaire de probabilites XXXIII, pages 1–68. Springer, 2006.
- Stochastic approximations and differential inclusions. SIAM Journal on Control and Optimization, 44(1):328–348, 2005.
- Convergence of constant step stochastic gradient descent for non-smooth non-convex functions. Set-Valued and Variational Analysis, pages 1–31, 2022.
- A closed-measure approach to stochastic approximation. arXiv preprint arXiv:2112.05482, 2021.
- Augmented Lagrangians with constrained subproblems and convergence to second-order stationary points. Computational Optimization and Applications, 69:51–75, 2018.
- Practical augmented Lagrangian methods for constrained optimization. SIAM, 2014.
- Clarke subgradients of stratifiable functions. SIAM Journal on Optimization, 18(2):556–572, 2007.
- Subgradient sampling for nonsmooth nonconvex minimization. SIAM Journal on Optimization, 33(4):2542–2569, 2023.
- Nonsmooth implicit differentiation for machine-learning and optimization. Advances in Neural Information Processing Systems, 34, 2021.
- A mathematical model for automatic differentiation in machine learning. Advances in Neural Information Processing Systems, 33:10809–10819, 2020.
- Conservative set valued fields, automatic differentiation, stochastic gradient methods and deep learning. Mathematical Programming, 188(1):19–51, 2021.
- Long term dynamics of the subgradient method for Lipschitz path differentiable functions. Journal of the European Mathematical Society, 2022.
- Differentiating nonsmooth solutions to parametric monotone inclusion problems. arXiv preprint arXiv:2212.07844, 2022.
- Nonconvex Lagrangian-based optimization: monitoring schemes and global convergence. Mathematics of Operations Research, 43(4):1210–1232, 2018.
- Stochastic first-order methods for convex and nonconvex functional constrained optimization. Mathematical Programming, 197(1):215–279, 2023.
- Vivek S Borkar. Stochastic approximation: a dynamical systems viewpoint, volume 48. Springer, 2009.
- Complexity of linearized augmented Lagrangian for optimization with nonlinear equality constraints. arXiv preprint arXiv:2301.08345, 2023.
- An inertial Newton algorithm for deep learning. The Journal of Machine Learning Research, 22(1):5977–6007, 2021.
- Solving stochastic compositional optimization is nearly as easy as solving stochastic optimization. IEEE Transactions on Signal Processing, 69:4937–4948, 2021.
- Frank H Clarke. Optimization and nonsmooth analysis, volume 5. SIAM, 1990.
- A dynamic alternating direction of multipliers for nonconvex minimization with nonlinear functional equality constraints. Journal of Optimization Theory and Applications, pages 1–30, 2022.
- Alternating and parallel proximal gradient methods for nonsmooth, nonconvex minimax: A unified convergence analysis. Mathematics of Operations Research, 2024.
- An adaptive augmented Lagrangian method for large-scale constrained optimization. Mathematical Programming, 152(1-2):201–245, 2015.
- Pathological subgradient dynamics. SIAM Journal on Optimization, 30(2):1327–1338, 2020.
- Stochastic subgradient method converges on tame functions. Foundations of Computational Mathematics, 20(1):119–154, 2020.
- Stochastic methods for composite and weakly convex optimization problems. SIAM Journal on Optimization, 28(4):3229–3259, 2018.
- A single timescale stochastic approximation method for nested stochastic optimization. SIAM Journal on Optimization, 30(1):960–979, 2020.
- An augmented Lagrangian approach for problems with random matrix composite structure. arXiv preprint arXiv:2305.01055, 2023.
- Stochastic in-face Frank-Wolfe methods for non-convex optimization and sparse neural network training. arXiv preprint arXiv:1906.03580, 2019.
- A stochastic subgradient method for distributionally robust non-convex and non-smooth learning. Journal of Optimization Theory and Applications, 194(3):1014–1041, 2022.
- An adaptive Lagrangian-based scheme for nonconvex composite optimization. Mathematics of Operations Research, 2023.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
- Magnus R Hestenes. Multiplier and gradient methods. Journal of Optimization Theory and Applications, 4(5):303–320, 1969.
- An improved unconstrained approach for bilevel optimization. SIAM Journal on Optimization, 33(4):2801–2829, 2023.
- Numerical optimization. Spinger, 2006.
- Adam: A method for stochastic optimization. In Proceedings of the 3rd International Conference for Learning Representations, 2015.
- Learning multiple layers of features from tiny images. 2009.
- Tam Le. Nonsmooth nonconvex stochastic heavy ball. arXiv preprint arXiv:2304.13328, 2023.
- Yann LeCun. The mnist database of handwritten digits. http://yann. lecun. com/exdb/mnist/, 1998.
- Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
- Rate-improved inexact augmented Lagrangian method for constrained nonconvex optimization. In International Conference on Artificial Intelligence and Statistics, pages 2170–2178. PMLR, 2021.
- Stochastic inexact augmented Lagrangian method for nonconvex expectation constrained optimization. Computational Optimization and Applications, pages 1–31, 2023.
- Complexity of an inexact proximal-point penalty method for constrained smooth non-convex optimization. Computational optimization and applications, 82(1):175–224, 2022.
- Quadratically regularized subgradient methods for weakly convex optimization with weakly convex constraints. In International Conference on Machine Learning, pages 6554–6564. PMLR, 2020.
- Stochastic Lagrangian-based method for nonconvex optimization with nonlinear constraints. 2023.
- Deep neural network training with Frank-Wolfe. arXiv preprint arXiv:2010.07243, 2020.
- Boris T Polyak. Some methods of speeding up the convergence of iteration methods. Ussr computational mathematics and mathematical physics, 4(5):1–17, 1964.
- Michael JD Powell. A method for nonlinear constraints in minimization problems. Optimization, pages 283–298, 1969.
- Andrzej Ruszczyński. Convergence of a stochastic subgradient method with averaging for nonsmooth nonconvex constrained optimization. Optimization Letters, 14(7):1615–1625, 2020.
- Lagrangian methods for composite optimization. Handbook of Numerical Analysis, 20:401–436, 2019.
- Faster Lagrangian-based methods in convex optimization. SIAM Journal on Optimization, 32(1):204–227, 2022.
- An inexact augmented Lagrangian framework for nonconvex optimization with nonlinear constraints. Advances in Neural Information Processing Systems, 32, 2019.
- Self-adaptive ADMM for semi-strongly convex problems. Mathematical Programming Computation, pages 1–38, 2023.
- Solving graph equipartition SDPs on an algebraic variety. Mathematical Programming, 204(1):299–347, 2024.
- Lou Van den Dries and Chris Miller. Geometric categories and o-minimal structures. Duke Mathematical Journal, 84(2):497–540, 1996.
- Strong variational sufficiency for nonlinear semidefinite programming and its implications. SIAM Journal on Optimization, 33(4):2988–3011, 2023.
- Adam-family methods for nonsmooth optimization with convergence guarantees. Journal of Machine Learning Research, 25(48):1–53, 2024.
- Convergence guarantees for stochastic subgradient methods in nonsmooth nonconvex optimization. arXiv preprint arXiv:2307.10053, 2023.
- Complexity of proximal augmented Lagrangian for nonconvex optimization with nonlinear equality constraints. Journal of Scientific Computing, 86:1–30, 2021.
- A unified single-loop alternating gradient projection algorithm for nonconvex–concave and convex–nonconcave minimax problems. Mathematical Programming, pages 1–72, 2023.
- Data-driven minimax optimization with expectation constraints. Operations Research, 2024.
- A first-order primal-dual method for nonconvex constrained optimization based on the augmented Lagrangian. Mathematics of Operations Research, 2023.
- Nachuan Xiao (20 papers)
- Kuangyu Ding (6 papers)
- Xiaoyin Hu (10 papers)
- Kim-Chuan Toh (111 papers)