Moreau Envelope for Nonconvex Bi-Level Optimization: A Single-loop and Hessian-free Solution Strategy (2405.09927v1)
Abstract: This work focuses on addressing two major challenges in the context of large-scale nonconvex Bi-Level Optimization (BLO) problems, which are increasingly applied in machine learning due to their ability to model nested structures. These challenges involve ensuring computational efficiency and providing theoretical guarantees. While recent advances in scalable BLO algorithms have primarily relied on lower-level convexity simplification, our work specifically tackles large-scale BLO problems involving nonconvexity in both the upper and lower levels. We simultaneously address computational and theoretical challenges by introducing an innovative single-loop gradient-based algorithm, utilizing the Moreau envelope-based reformulation, and providing non-asymptotic convergence analysis for general nonconvex BLO problems. Notably, our algorithm relies solely on first-order gradient information, enhancing its practicality and efficiency, especially for large-scale BLO learning tasks. We validate our approach's effectiveness through experiments on various synthetic problems, two typical hyper-parameter learning tasks, and a real-world neural architecture search application, collectively demonstrating its superior performance.
- Amortized implicit differentiation for stochastic bilevel optimization. In ICLR, 2022a.
- Non-convex bilevel games with critical point selection maps. In NeurIPS, volume 35, pp. 8013–8026, 2022b.
- Optimality conditions for bilevel programs via Moreau envelope reformulation. arXiv preprint arXiv:2311.14857, 2023.
- Beck, A. First-order methods in optimization. SIAM, 2017.
- Making a science of model search: Hyperparameter optimization in hundreds of dimensions for vision architectures. In ICML, pp. 115–123, 2013.
- Implicit differentiation of lasso-type models for hyperparameter optimization. In ICML, pp. 810–821, 2020.
- Implicit differentiation for fast hyperparameter selection in non-smooth convex learning. JMLR, 23(1):6680–6722, 2022.
- Variable smoothing for weakly convex composite functions. JOTA, 188:628–649, 2021.
- Near-optimal fully first-order algorithms for finding stationary points in bilevel optimization. arXiv preprint arXiv:2306.14853, 2023.
- Closing the gap: Tighter analysis of alternating stochastic gradient methods for bilevel problems. In NeurIPS, volume 34, pp. 25294–25307, 2021.
- Progressive differentiable architecture search: Bridging the depth gap between search and evaluation. In ICCV, pp. 1294–1303, 2019.
- A fast and convergent proximal algorithm for regularized nonconvex and nonsmooth bi-level optimization. arXiv preprint arXiv:2203.16615, 2022.
- A framework for bilevel optimization that enables stochastic and global variance reduction algorithms. In NeurIPS, volume 35, pp. 26698–26710, 2022.
- Meta-learning of neural architectures for few-shot learning. In CVPR, pp. 12365–12375, 2020.
- Gradient-based regularization parameter selection for problems with nonsmooth penalty functions. JCGS, 27(2):426–435, 2018.
- Model-agnostic meta-learning for fast adaptation of deep networks. In ICML, pp. 1126–1135, 2017.
- Forward and reverse gradient-based hyperparameter optimization. In ICML, pp. 1165–1173, 2017.
- Bilevel programming for hyperparameter optimization and meta-learning. In ICML, pp. 1568–1577, 2018.
- Value function based difference-of-convex algorithm for bilevel hyperparameter selection problems. In ICML, pp. 7164–7182, 2022.
- Moreau envelope based difference-of-weakly-convex reformulation and algorithm for bilevel programs. arXiv preprint arXiv:2306.16761, 2023.
- Approximation methods for bilevel programming. arXiv preprint arXiv:1802.02246, 2018.
- On the iteration complexity of hypergradient computation. In ICML, pp. 3748–3758, 2020.
- A two-timescale framework for bilevel optimization: Complexity analysis and application to actor-critic. SIOPT, 33(1):147–180, 2023.
- Huang, F. Adaptive mirror descent bilevel optimization. arXiv preprint arXiv:2311.04520, 2023a.
- Huang, F. On momentum-based gradient methods for bilevel optimization with nonconvex lower-level. arXiv preprint arXiv:2303.03944, 2023b.
- Enhanced bilevel optimization via Bregman distance. In NeurIPS, volume 35, pp. 28928–28939, 2022.
- Lower bounds and accelerated algorithms for bilevel optimization. JMLR, 23:1–56, 2022.
- Convergence of meta-learning with task-specific adaptation over partial parameters. In NeurIPS, volume 33, pp. 11490–11500, 2020a.
- Bilevel optimization: Nonasymptotic analysis and faster algorithms. arXiv preprint arXiv:2010.07962, 2020b.
- Will bilevel optimizers benefit from loops. In NeurIPS, volume 35, pp. 3011–3023, 2022.
- On penalty methods for nonconvex bilevel optimization and first-order stochastic approximation. arXiv preprint arXiv:2309.01753, 2023a.
- A fully first-order method for stochastic bilevel optimization. In ICML, pp. 18083–18113, 2023b.
- Improved bilevel model: Fast and optimal algorithm with theoretical guarantee. arXiv preprint arXiv:2009.00690, 2020.
- A fully single loop algorithm for bilevel optimization without hessian inverse. In AAAI, volume 36, pp. 7426–7434, 2022.
- DARTS: Differentiable architecture search. In ICLR, 2018.
- A generic first-order algorithmic framework for bi-level programming beyond lower-level singleton. In ICML, pp. 6305–6315, 2020.
- Investigating bi-level optimization for learning and vision from a unified perspective: A survey and beyond. IEEE TPAMI, 44(12):10045–10067, 2021a.
- A value-function-based interior-point method for non-convex bi-level optimization. In ICML, pp. 6882–6892, 2021b.
- Towards gradient-based bilevel optimization with non-convex followers and beyond. In NeurIPS, volume 34, pp. 8662–8675, 2021c.
- A general descent aggregation framework for gradient-based bi-level optimization. IEEE TPAMI, 45(1):38–57, 2022.
- Learning with constraint learning: New perspective, solution strategy and various applications. arXiv preprint arXiv:2307.15257, 2023a.
- Averaged method of multipliers for bi-level optimization without lower-level strong convexity. In ICML, pp. 21839–21866, 2023b.
- Augmenting iterative trajectory for bilevel optimization: Methodology, analysis and extensions. arXiv preprint arXiv:2303.16397, 2023c.
- Lu, S. SLM: A smoothed first-order Lagrangian method for structured constrained nonconvex optimization. In NeurIPS, 2023.
- First-order penalty methods for bilevel optimization. arXiv preprint arXiv:2301.01716, 2023.
- Self-tuning networks: Bilevel optimization of hyperparameters using structured best-response functions. In ICLR, 2019.
- Task-driven dictionary learning. IEEE TPAMI, 34(4):791–804, 2011.
- Mordukhovich, B. S. Variational analysis and applications, volume 30. 2018.
- Hyperparameter learning via bilevel nonsmooth optimization. arXiv preprint arXiv:1806.01520, 2018.
- Pedregosa, F. Hyperparameter optimization with approximate gradient. In ICML, pp. 737–746, 2016.
- Meta-learning with implicit gradients. In NeurIPS, volume 32, 2019.
- Rockafellar, R. T. Conjugate duality and optimization. SIAM, 1974.
- On penalty-based bilevel gradient descent method. In ICML, pp. 30992–31015, 2023.
- A constrained optimization approach to bilevel optimization with multiple inner minima. arXiv preprint arXiv:2203.01123, 2022a.
- On the convergence theory for hessian-free bilevel algorithms. In NeurIPS, volume 35, pp. 4136–4149, 2022b.
- An alternating optimization method for bilevel problems under the Polyak-Łojasiewicz condition. In NeurIPS, 2023.
- Pc-darts: Partial channel connections for memory-efficient architecture search. In ICLR, 2019.
- Accelerating inexact hypergradient descent for bilevel optimization. arXiv preprint arXiv:2307.00126, 2023a.
- Achieving O(ϵ−1.5)𝑂superscriptitalic-ϵ1.5{O}(\epsilon^{-1.5})italic_O ( italic_ϵ start_POSTSUPERSCRIPT - 1.5 end_POSTSUPERSCRIPT ) complexity in Hessian/Jacobian-free stochastic bilevel optimization. In NeurIPS, 2023b.
- Constrained bi-level optimization: Proximal lagrangian value function approach and hessian-free algorithm. In ICLR, 2024.
- Optimality conditions for bilevel programming problems. Optimization, 33(1):9–27, 1995.
- Difference of convex algorithms for bilevel programs with applications in hyperparameter selection. MP, 198(2):1583–1616, 2023.
- Bome! bilevel optimization made easy: A simple first-order approach. In NeurIPS, volume 35, pp. 17248–17262, 2022.
- An introduction to bi-level optimization: Foundations and applications in signal processing and machine learning. arXiv preprint arXiv:2308.00788, 2023.
- Adversarial attacks on graph neural networks via meta learning. In ICLR, 2018.