Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Bayesian Fixed-Budget Best-Arm Identification (2211.08572v3)

Published 15 Nov 2022 in cs.LG and stat.ML

Abstract: Fixed-budget best-arm identification (BAI) is a bandit problem where the agent maximizes the probability of identifying the optimal arm within a fixed budget of observations. In this work, we study this problem in the Bayesian setting. We propose a Bayesian elimination algorithm and derive an upper bound on its probability of misidentifying the optimal arm. The bound reflects the quality of the prior and is the first distribution-dependent bound in this setting. We prove it using a frequentist-like argument, where we carry the prior through, and then integrate out the bandit instance at the end. We also provide a lower bound on the probability of misidentification in a $2$-armed Bayesian bandit and show that our upper bound (almost) matches it for any budget. Our experiments show that Bayesian elimination is superior to frequentist methods and competitive with the state-of-the-art Bayesian algorithms that have no guarantees in our setting.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (32)
  1. S. Agrawal and N. Goyal. Thompson sampling for contextual bandits with linear payoffs. In Proceedings of the 30th International Conference on Machine Learning, pages 127–135, 2013.
  2. Best Arm Identification in Multi-Armed Bandits. In Proceedings of the 23th Conference on Learning Theory, 2010.
  3. Fixed-budget best-arm identification in structured bandits. In Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI-22, pages 2798–2804, 2022.
  4. Optimal thompson sampling strategies for support-aware cvar bandits. In International Conference on Machine Learning, pages 716–726. PMLR, 2021.
  5. J. L. Bretagnolle and C. Huber. Estimation des densités: risque minimax. Zeitschrift für Wahrscheinlichkeitstheorie und Verwandte Gebiete, 47:119–137, 1978.
  6. Pure exploration in multi-armed bandits problems. In International conference on Algorithmic learning theory, pages 23–37, 2009.
  7. A. Carpentier and A. Locatelli. Tight (lower) bounds for the fixed budget best arm identification bandit problem. 05 2016.
  8. Y. Chen and I. O. Ryzhov. Complete expected improvement converges to an optimal budget allocation. Advances in Applied Probability, 51(1):209–235, 2019. ISSN 00018678.
  9. Action elimination and stopping conditions for the multi-armed bandit and reinforcement learning problems. Journal of Machine Learning Research, 7(39):1079–1105, 2006.
  10. Best arm identification: A unified approach to fixed budget and fixed confidence. In Advances in Neural Information Processing Systems, pages 3212–3220, 2012.
  11. W. Hoeffding. Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58(301):13–30, 1963. ISSN 01621459.
  12. On correlation and budget constraints in model-based bandit optimization with application to automatic machine learning. In Artificial Intelligence and Statistics, pages 365–374, 2014.
  13. Hierarchical Bayesian bandits. In Proceedings of the 25th International Conference on Artificial Intelligence and Statistics, 2022a.
  14. Thompson sampling with a mixture prior. In Proceedings of the 25th International Conference on Artificial Intelligence and Statistics, 2022b.
  15. The dssat crop modeling ecosystem. In Advances in crop modelling for a sustainable agriculture, pages 173–216. Burleigh Dodds Science Publishing, 2019.
  16. Top two algorithms revisited. arXiv preprint arXiv:2206.05979, 2022.
  17. Almost optimal exploration in multi-armed bandits. In Proceedings of the 30th International Conference on International Conference on Machine Learning, page 1238–1246, 2013.
  18. Optimal best arm identification in two-armed bandits with a fixed budget under a small gap, 2022.
  19. On the complexity of best-arm identification in multi-armed bandit models. Journal of Machine Learning Research, 2016.
  20. Optimal simple regret in Bayesian best arm identification, 2021.
  21. Near-optimal sensor placements in Gaussian processes: Theory, efficient algorithms and empirical studies. Journal of Machine Learning Research, 9:235–284, 2008.
  22. T. L. Lai. Adaptive Treatment Allocation and the Multi-Armed Bandit Problem. The Annals of Statistics, 15(3):1091 – 1114, 1987. doi: 10.1214/aos/1176350495.
  23. T. Lattimore and C. Szepesvári. Bandit Algorithms. Cambridge University Press, 2020.
  24. K. Murphy. Conjugate Bayesian analysis of the Gaussian distribution. 11 2007.
  25. C. Qin. Open problem: Optimal best arm identification with fixed-budget. In P.-L. Loh and M. Raginsky, editors, Proceedings of Thirty Fifth Conference on Learning Theory, volume 178 of Proceedings of Machine Learning Research, pages 5650–5654. PMLR, 02–05 Jul 2022.
  26. D. Russo. Simple Bayesian algorithms for best arm identification. In Conference on Learning Theory, pages 1417–1418. PMLR, 2016.
  27. D. Russo and B. V. Roy. Learning to optimize via information-directed sampling. Oper. Res., 66(1):230–252, 2018.
  28. D. Russo and B. Van Roy. Learning to optimize via posterior sampling. Mathematics of Operations Research, 39(4):1221–1243, 2014.
  29. D. Russo and B. Van Roy. An information-theoretic analysis of Thompson sampling. Journal of Machine Learning Research, 17(68):1–30, 2016.
  30. A tutorial on Thompson sampling. Foundations and Trends® in Machine Learning, 11(1):1–96, 2018.
  31. I. O. Ryzhov. On the convergence rates of expected improvement methods. Operations Research, 64(6):1515–1528, 2016. ISSN 0030364X, 15265463.
  32. Best-arm identification in linear bandits, 2014.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Alexia Atsidakou (7 papers)
  2. Sumeet Katariya (20 papers)
  3. Sujay Sanghavi (97 papers)
  4. Branislav Kveton (98 papers)
Citations (10)

Summary

We haven't generated a summary for this paper yet.