Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Stochastic Optimization with Constraints: A Non-asymptotic Instance-Dependent Analysis (2404.00042v1)

Published 24 Mar 2024 in math.OC, cs.AI, cs.LG, and stat.ML

Abstract: We consider the problem of stochastic convex optimization under convex constraints. We analyze the behavior of a natural variance reduced proximal gradient (VRPG) algorithm for this problem. Our main result is a non-asymptotic guarantee for VRPG algorithm. Contrary to minimax worst case guarantees, our result is instance-dependent in nature. This means that our guarantee captures the complexity of the loss function, the variability of the noise, and the geometry of the constraint set. We show that the non-asymptotic performance of the VRPG algorithm is governed by the scaled distance (scaled by $\sqrt{N}$) between the solutions of the given problem and that of a certain small perturbation of the given problem -- both solved under the given convex constraints; here, $N$ denotes the number of samples. Leveraging a well-established connection between local minimax lower bounds and solutions to perturbed problems, we show that as $N \rightarrow \infty$, the VRPG algorithm achieves the renowned local minimax lower bound by H`{a}jek and Le Cam up to universal constants and a logarithmic factor of the sample size.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (23)
  1. A framework for estimation of convex functions. Statistica Sinica, pages 423–456, 2015.
  2. Asymptotic normality and optimality in nonsmooth stochastic approximation. arXiv preprint arXiv:2301.06632, 2023.
  3. Implicit functions and solution mappings: A view from variational analysis, volume 616. Springer, 2009.
  4. Local minimax complexity of stochastic convex optimization. Advances in Neural Information Processing Systems, 29, 2016.
  5. J. C. Duchi and F. Ruan. Asymptotic optimality in stochastic optimization. 2021.
  6. J. Dupacová and R. Wets. Asymptotic behavior of statistical estimators and of optimal solutions of stochastic optimization problems. The annals of statistics, 16(4):1517–1549, 1988.
  7. J. Hájek. Local asymptotic minimax and admissibility in estimation. In Proceedings of the sixth Berkeley symposium on mathematical statistics and probability, volume 1, pages 175–194, 1972.
  8. J.-B. Hiriart-Urruty and C. Lemaréchal. Convex analysis and minimization algorithms I: Fundamentals, volume 305. Springer science & business media, 1996.
  9. R. Johnson and T. Zhang. Accelerating stochastic gradient descent using predictive variance reduction. Advances in neural information processing systems, 26, 2013.
  10. Is temporal difference learning optimal? an instance-dependent analysis. SIAM Journal on Mathematics of Data Science, 3(4):1013–1040, 2021.
  11. Instance-optimality in optimal value estimation: Adaptivity via variance-reduced q-learning. arXiv preprint arXiv:2106.14352, 2021.
  12. A. J. King. Asymptotic behaviour of solutions in stochastic optimization: nonsmooth analysis and the derivation of non-normal limit distributions (least squares). 1986.
  13. Asymptotic theory for solutions in statistical estimation and stochastic programming. Mathematics of Operations Research, 18(1):148–162, 1993.
  14. L. Le Cam et al. Limits of experiments. In Proceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Probability, volume 1, pages 245–261. University of California Press, 1972.
  15. Asymptotics in statistics: some basic concepts. 2000.
  16. Y. Nesterov. Primal-dual subgradient methods for convex problems. Mathematical programming, 120(1):221–259, 2009.
  17. R. Poliquin and R. T. Rockafellar. Tilt stability of a local minimum. SIAM Journal on Optimization, 8(2):287–299, 1998.
  18. A. Shapiro. Asymptotic properties of statistical estimators in stochastic programming. The Annals of Statistics, 17(2):841–858, 1989.
  19. A. W. Van der Vaart. Asymptotic statistics, volume 3. Cambridge university press, 2000.
  20. M. J. Wainwright. Stochastic approximation with cone-contractive operators: Sharp ℓ∞subscriptℓ\ell_{\infty}roman_ℓ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT-bounds for q𝑞qitalic_q-learning. arXiv preprint arXiv:1905.06265, 2019.
  21. M. J. Wainwright. Variance-reduced q𝑞qitalic_q-learning is minimax optimal. arXiv preprint arXiv:1906.04697, 2019.
  22. J. Wellner et al. Weak convergence and empirical processes: with applications to statistics. Springer Science & Business Media, 2013.
  23. L. Xiao and T. Zhang. A proximal stochastic gradient method with progressive variance reduction. SIAM Journal on Optimization, 24(4):2057–2075, 2014.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com