Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 49 tok/s
Gemini 2.5 Pro 53 tok/s Pro
GPT-5 Medium 19 tok/s Pro
GPT-5 High 16 tok/s Pro
GPT-4o 103 tok/s Pro
Kimi K2 172 tok/s Pro
GPT OSS 120B 472 tok/s Pro
Claude Sonnet 4 39 tok/s Pro
2000 character limit reached

On Momentum-Based Gradient Methods for Bilevel Optimization with Nonconvex Lower-Level (2303.03944v4)

Published 7 Mar 2023 in math.OC, cs.LG, cs.NA, and math.NA

Abstract: Bilevel optimization is a popular two-level hierarchical optimization, which has been widely applied to many machine learning tasks such as hyperparameter learning, meta learning and continual learning. Although many bilevel optimization methods recently have been developed, the bilevel methods are not well studied when the lower-level problem is nonconvex. To fill this gap, in the paper, we study a class of nonconvex bilevel optimization problems, where both upper-level and lower-level problems are nonconvex, and the lower-level problem satisfies Polyak-{\L}ojasiewicz (PL) condition. We propose an efficient momentum-based gradient bilevel method (MGBiO) to solve these deterministic problems. Meanwhile, we propose a class of efficient momentum-based stochastic gradient bilevel methods (MSGBiO and VR-MSGBiO) to solve these stochastic problems. Moreover, we provide a useful convergence analysis framework for our methods. Specifically, under some mild conditions, we prove that our MGBiO method has a sample (or gradient) complexity of $O(\epsilon{-2})$ for finding an $\epsilon$-stationary solution of the deterministic bilevel problems (i.e., $|\nabla F(x)|\leq \epsilon$), which improves the existing best results by a factor of $O(\epsilon{-1})$. Meanwhile, we prove that our MSGBiO and VR-MSGBiO methods have sample complexities of $\tilde{O}(\epsilon{-4})$ and $\tilde{O}(\epsilon{-3})$, respectively, in finding an $\epsilon$-stationary solution of the stochastic bilevel problems (i.e., $\mathbb{E}|\nabla F(x)|\leq \epsilon$), which improves the existing best results by a factor of $\tilde{O}(\epsilon{-3})$. Extensive experimental results on bilevel PL game and hyper-representation learning demonstrate the efficiency of our algorithms. This paper commemorates the mathematician Boris Polyak (1935 -2023).

Citations (16)

Summary

We haven't generated a summary for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Lightbulb On Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (1)