Semi-Bandit Learning for Monotone Stochastic Optimization (2312.15427v1)
Abstract: Stochastic optimization is a widely used approach for optimization under uncertainty, where uncertain input parameters are modeled by random variables. Exact or approximation algorithms have been obtained for several fundamental problems in this area. However, a significant limitation of this approach is that it requires full knowledge of the underlying probability distributions. Can we still get good (approximation) algorithms if these distributions are unknown, and the algorithm needs to learn them through repeated interactions? In this paper, we resolve this question for a large class of "monotone" stochastic problems, by providing a generic online learning algorithm with $\sqrt{T \log T}$ regret relative to the best approximation algorithm (under known distributions). Importantly, our online algorithm works in a semi-bandit setting, where in each period, the algorithm only observes samples from the r.v.s that were actually probed. Our framework applies to several fundamental problems in stochastic optimization such as prophet inequality, Pandora's box, stochastic knapsack, stochastic matchings and stochastic submodular optimization.
- Marek Adamczyk. Improved analysis of the greedy algorithm for stochastic matching. Inf. Process. Lett., 111(15):731–737, 2011.
- Prophet inequalities with limited information. In ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 1358–1377. SIAM, 2014.
- Saeed Alaei. Bayesian combinatorial auctions: Expanding single buyer mechanisms to many buyers. SIAM J. Comput., 43(2):930–972, 2014.
- Maximizing stochastic monotone submodular functions. Management Science, 62(8):2374–2391, 2016.
- Minimax regret bounds for reinforcement learning. In Proceedings of the 34th International Conference on Machine Learning (ICML), volume 70, pages 263–272. PMLR, 2017.
- Submodular stochastic probing on matroids. Math. Oper. Res., 41(3):1022–1038, 2016.
- Pandora’s box problem with order constraints. Math. Oper. Res., 48(1):498–519, 2023.
- Improved approximation results for stochastic knapsack problems. In Proceedings of the 22nd Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1647–1665, 2011.
- When LP is the cure for your matching woes: Improved bounds for stochastic matchings. Algorithmica, 63(4):733–762, 2012.
- Improved guarantees for offline stochastic matching via new ordered contention resolution schemes. In Advances in Neural Information Processing Systems, pages 27184–27195, 2021.
- (near) optimal adaptivity gaps for stochastic multi-value probing. In Approximation, Randomization, and Combinatorial Optimization (APPROX/RANDOM), volume 145 of LIPIcs, pages 49:1–49:21, 2019.
- Richard Butterworth. Some reliability fault-testing models. Operations Research, 20(2):335–343, 1972.
- Prediction, learning, and games. Cambridge university press, 2006.
- Maximizing a monotone submodular function subject to a matroid constraint. SIAM J. Comput., 40(6):1740–1766, 2011.
- Single-sample prophet inequalities via greedy-ordered selection. In ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 1298–1325. SIAM, 2022.
- Prophet inequalities for iid random variables from an unknown distribution. In Proceedings of the 2019 ACM Conference on Economics and Computation, pages 3–17, 2019.
- Pandora’s box with correlations: Learning and approximation. In 2020 IEEE 61st Annual Symposium on Foundations of Computer Science (FOCS), pages 1214–1225. IEEE, 2020.
- Approximating matches made in heaven. In 36th International Colloquium on Automata, Languages and Programming (ICALP, pages 266–278.
- The sample complexity of revenue maximization. In Proceedings of the forty-sixth annual ACM symposium on Theory of computing, pages 243–252, 2014.
- Submodular function maximization via the multilinear relaxation and contention resolution schemes. SIAM J. Comput., 43(6):1831–1879, 2014.
- Approximating the stochastic knapsack problem: The benefit of adaptivity. Math. Oper. Res., 33(4):945–964, 2008.
- Approximation algorithms for stochastic submodular set cover with applications to boolean function evaluation and min-knapsack. ACM Transactions on Algorithms (TALG), 12(3):1–28, 2016.
- Analytical approach to parallel repetition. In ACM Symposium on Theory of Computing, pages 624–633, 2014.
- Prophet matching with general arrivals. Math. Oper. Res., 47(2):878–898, 2022.
- Online pandora’s boxes and bandits. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 1885–1892, 2019.
- Online contention resolution schemes with applications to bayesian selection problems. SIAM J. Comput., 50(2):255–300, 2021.
- Naveen Garg. Saving an epsilon: a 2-approximation for the k-mst problem in graphs. In ACM Symposium on Theory of Computing, pages 396–402. ACM, 2005.
- Generalizing complex hypotheses on product distributions: Auctions, prophet inequalities, and pandora’s problem. In Conference on Learning Theory, pages 2248–2288. PMLR, 2021.
- Settling the sample complexity of single-parameter revenue maximization. In Moses Charikar and Edith Cohen, editors, Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing (STOC), pages 662–673. ACM, 2019.
- Adaptive submodularity: A new approach to active learning and stochastic optimization. CoRR, abs/1003.3967, 2017.
- Approximation algorithms for correlated knapsacks and non-martingale bandits. In Rafail Ostrovsky, editor, IEEE 52nd Annual Symposium on Foundations of Computer Science (FOCS), pages 827–836, 2011.
- Running errands in time: Approximation algorithms for stochastic orienteering. Math. Oper. Res., 40(1):56–79, 2015.
- Bandit algorithms for prophet inequality and pandora’s box. In Proceedings of the 2023 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA). SIAM, 2024.
- Adaptive submodular maximization in bandit setting. Advances in Neural Information Processing Systems, 26, 2013.
- A stochastic probing problem with applications. In Integer Programming and Combinatorial Optimization - 16th International Conference, pages 205–216, 2013.
- Adaptivity gaps for stochastic probing: Submodular and xos functions. In Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1688–1702. SIAM, 2017.
- Online learning for min sum set cover and pandora’s box. In International Conference on Machine Learning, pages 7382–7403. PMLR, 2022.
- Stochastic covering and adaptivity. In Latin American symposium on theoretical informatics, pages 532–543. Springer, 2006.
- Subhashis Ghosal and Aad Van der Vaart. Fundamentals of nonparametric Bayesian inference, volume 44. Cambridge University Press, 2017.
- Elad Hazan. Introduction to online convex optimization. MIT Press, 2022.
- A tight bound for stochastic submodular cover. J. Artif. Intell. Res., 71:347–370, 2021.
- Minimum latency submodular cover. ACM Transactions on Algorithms (TALG), 13(1):1–28, 2016.
- Is q-learning provably efficient? In Advances in Neural Information Processing Systems, pages 4868–4878, 2018.
- Learning adversarial markov decision processes with bandit feedback and unknown transition. In Proceedings of the 37th International Conference on Machine Learning, (ICML), volume 119, pages 4860–4869. PMLR, 2020.
- Algorithms and adaptivity gaps for stochastic k-tsp. In 11th Innovations in Theoretical Computer Science Conference (ITCS 2020). Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 2020.
- Tight guarantees for multi-unit prophet inequalities and online stochastic knapsack. In ACM-SIAM Symposium on Discrete Algorithms, (SODA), pages 1221–1246. SIAM, 2022.
- Near-optimal regret bounds for reinforcement learning. J. Mach. Learn. Res., 11:1563–1600, 2010.
- Playing games with approximation algorithms. SIAM J. Comput., 39(3):1088–1106, 2009.
- Semiamarts and finite values. Bulletin of the American Mathematical Society, 83:745–747, 1977.
- Approximations for monotone and nonmonotone submodular maximization with knapsack constraints. Math. Oper. Res., 38(4):729–739, 2013.
- Efficient algorithms for online decision problems. J. Comput. Syst. Sci., 71(3):291–307, 2005.
- Matroid prophet inequalities. In Proceedings of the forty-fourth annual ACM symposium on Theory of computing, pages 123–136, 2012.
- Matroid prophet inequalities and applications to multi-dimensional mechanism design. Games Econ. Behav., 113:97–115, 2019.
- Descending price optimally coordinates search. In ACM Conference on Economics and Computation, pages 23–24. ACM, 2016.
- Asymptotically Efficient Adaptive Allocation Rules. Advances in Applied Mathematics, 6:4–22, 1985.
- Bandit algorithms. Cambridge University Press, 2020.
- Will Ma. Improvements and generalizations of stochastic knapsack and markovian bandits approximation algorithms. Math. Oper. Res., 43(3):789–812, 2018.
- Online learning via offline greedy algorithms: Applications in market design and optimization. In Proceedings of the 22nd ACM Conference on Economics and Computation, pages 737–738, 2021.
- An analysis of approximations for maximizing submodular set functions - I. Math. Program., 14(1):265–294, 1978.
- On lower bounds for regret in reinforcement learning. CoRR, abs/1608.02732, 2016.
- R Rado. Theorems on linear combinatorial topology and general measure. Annals of Mathematics, pages 228–270, 1943.
- Ironing in the dark. In Proceedings of the 2016 ACM Conference on Economics and Computation, pages 1–18, 2016.
- Combinatorial prophet inequalities. In Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1671–1687. SIAM, 2017.
- Optimal single-choice prophet inequalities from samples. In 11th Innovations in Theoretical Computer Science Conference, (ITCS), volume 151 of LIPIcs, pages 60:1–60:10, 2020.
- Alexander Schrijver et al. Combinatorial optimization: polyhedra and efficiency, volume 24. Springer, 2003.
- Ester Samuel-Cahn. Comparison of threshold stop rules and maximum for independent nonnegative random variables. the Annals of Probability, pages 1213–1216, 1984.
- Sahil Singla. The price of information in combinatorial optimization. In Proceedings of the twenty-ninth annual ACM-SIAM symposium on discrete algorithms, pages 2523–2532. SIAM, 2018.
- Martin L Weitzman. Optimal Search for the Best Alternative. Econometrica, 47(3):641–654, May 1979.
- L.A. Wolsey. An analysis of the greedy algorithm for the submodular set covering problem. Combinatorica, 2(4):385–393, 1982.