Constrained Online Two-stage Stochastic Optimization: Near Optimal Algorithms via Adversarial Learning (2302.00997v5)
Abstract: We consider an online two-stage stochastic optimization with long-term constraints over a finite horizon of $T$ periods. At each period, we take the first-stage action, observe a model parameter realization and then take the second-stage action from a feasible set that depends both on the first-stage decision and the model parameter. We aim to minimize the cumulative objective value while guaranteeing that the long-term average second-stage decision belongs to a set. We develop online algorithms for the online two-stage problem from adversarial learning algorithms. Also, the regret bound of our algorithm cam be reduced to the regret bound of embedded adversarial learning algorithms. Based on our framework, we obtain new results under various settings. When the model parameter at each period is drawn from identical distributions, we derive \textit{state-of-art} $O(\sqrt{T})$ regret that improves previous bounds under special cases. Our algorithm is also robust to adversarial corruptions of model parameter realizations. When the model parameters are drawn from unknown non-stationary distributions and we are given machine-learned predictions of the distributions, we develop a new algorithm from our framework with a regret $O(W_T+\sqrt{T})$, where $W_T$ measures the total inaccuracy of the machine-learned predictions.
- S. Agrawal and N. R. Devanur. Bandits with concave rewards and convex knapsacks. In Proceedings of the fifteenth ACM conference on Economics and computation, pages 989–1006, 2014a.
- S. Agrawal and N. R. Devanur. Fast algorithms for online stochastic convex programming. In Proceedings of the twenty-sixth annual ACM-SIAM symposium on Discrete algorithms, pages 1405–1424. SIAM, 2014b.
- A dynamic near-optimal algorithm for online linear programming. Operations Research, 62(4):876–890, 2014.
- A. Arlotto and I. Gurvich. Uniformly bounded regret in the multisecretary problem. Stochastic Systems, 2019.
- A. Arlotto and X. Xie. Logarithmic regret in the dynamic and stochastic knapsack problem with equal rewards. Stochastic Systems, 2020.
- The nonstochastic multiarmed bandit problem. SIAM journal on computing, 32(1):48–77, 2002.
- Bandits with knapsacks. In 2013 IEEE 54th Annual Symposium on Foundations of Computer Science, pages 207–216. IEEE, 2013.
- Robust budget pacing with a single sample. arXiv preprint arXiv:2302.02006, 2023.
- The best of many worlds: Dual mirror descent for online allocation problems. Operations Research, 2022.
- Non-stationary stochastic optimization. Operations research, 63(5):1227–1244, 2015.
- J. R. Birge and F. Louveaux. Introduction to stochastic programming. Springer Science & Business Media, 2011.
- N. Buchbinder and J. Naor. Online primal-dual algorithms for covering and packing. Mathematics of Operations Research, 34(2):270–286, 2009.
- Online learning with knapsacks: the best of both worlds. In International Conference on Machine Learning, pages 2767–2783. PMLR, 2022a.
- A unifying framework for online optimization with long-term constraints. Advances in Neural Information Processing Systems, 2022b.
- T. S. Ferguson et al. Who solved the secretary problem? Statistical science, 4(3):282–289, 1989.
- Y. Freund and R. E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of computer and system sciences, 55(1):119–139, 1997.
- A. Gupta and M. Molinaro. How experts can solve lps online. In European Symposium on Algorithms, pages 517–529. Springer, 2014.
- Better algorithms for stochastic bandits with adversarial corruptions. In Conference on Learning Theory, pages 1562–1578. PMLR, 2019.
- E. Hall and R. Willett. Dynamical models and tracking regret in online convex programming. In International Conference on Machine Learning, pages 579–587. PMLR, 2013.
- E. Hazan. Introduction to online convex optimization. Foundations and Trends in Optimization, 2(3-4):157–325, 2016.
- A theory of QoS for wireless. IEEE, 2009.
- Adversarial bandits with knapsacks. In 2019 IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS), pages 202–219. IEEE, 2019.
- Online optimization: Competing with dynamic comparators. In Artificial Intelligence and Statistics, pages 398–406. PMLR, 2015.
- Adaptive algorithms for online convex optimization with long-term constraints. In International Conference on Machine Learning, pages 402–411. PMLR, 2016.
- Online stochastic optimization with wasserstein based non-stationarity. arXiv preprint arXiv:2012.06961, 2020.
- Achieving high individual service levels without safety stock? optimal rationing policy of pooled resources. Operations Research, 2022.
- Preventing fairness gerrymandering: Auditing and learning for subgroup fairness. In International conference on machine learning, pages 2564–2572. PMLR, 2018.
- Primal beats dual on online packing lps in the random-order model. In Proceedings of the forty-sixth annual ACM symposium on Theory of computing, pages 303–312. ACM, 2014.
- Simple and fast algorithm for binary integer and online linear programming. Advances in Neural Information Processing Systems, 33:9412–9421, 2020.
- Non-stationary bandits with knapsacks. arXiv preprint arXiv:2205.12427, 2022.
- Stochastic bandits robust to adversarial corruptions. In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, pages 114–122, 2018.
- Capacity allocation in flexible production networks: Theory and applications. Management Science, 65(11):5091–5109, 2019.
- Trading regret for efficiency: online convex optimization with long term constraints. Journal of Machine Learning Research, 13(Sep):2503–2528, 2012.
- Adwords and generalized online matching. Journal of the ACM (JACM), 54(5):22–es, 2007.
- M. Molinaro and R. Ravi. The geometry of online packing linear programs. Mathematics of Operations Research, 39(1):46–59, 2014.
- M. J. Neely and H. Yu. Online convex optimization with time-varying constraints. arXiv preprint arXiv:1702.04783, 2017.
- Robust stochastic approximation approach to stochastic programming. SIAM Journal on optimization, 19(4):1574–1609, 2009.
- Unifying the stochastic and the adversarial bandits with knapsack. arXiv preprint arXiv:1811.12253, 2018.
- The online saddle point problem and online convex optimization with knapsacks. arXiv preprint arXiv:1806.08301, 2018.
- Watch and learn: Optimizing from revealed preferences feedback. In Proceedings of the forty-eighth annual ACM symposium on Theory of Computing, pages 949–962, 2016.
- A. Shapiro and T. Homem-de Mello. On the rate of convergence of optimal solutions of monte carlo approximations of stochastic programs. SIAM journal on optimization, 11(1):70–86, 2000.
- M. Sion. On general minimax theorems. Pacific Journal of mathematics, 8(1):171–176, 1958.
- Regret and cumulative constraint violation analysis for online convex optimization with long term constraints. In International Conference on Machine Learning, pages 11998–12008. PMLR, 2021.
- J. Yuan and A. Lamperski. Online convex optimization for cumulative constraints. In Advances in Neural Information Processing Systems, pages 6137–6146, 2018.
- Resource pooling and allocation policies to deliver differentiated service. Management Science, 64(4):1555–1573, 2018.
- M. Zinkevich. Online convex programming and generalized infinitesimal gradient ascent. In Proceedings of the 20th international conference on machine learning (icml-03), pages 928–936, 2003.