Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Best Arm Identification in Stochastic Bandits: Beyond $β-$optimality (2301.03785v2)

Published 10 Jan 2023 in stat.ML and cs.LG

Abstract: This paper investigates a hitherto unaddressed aspect of best arm identification (BAI) in stochastic multi-armed bandits in the fixed-confidence setting. Two key metrics for assessing bandit algorithms are computational efficiency and performance optimality (e.g., in sample complexity). In stochastic BAI literature, there have been advances in designing algorithms to achieve optimal performance, but they are generally computationally expensive to implement (e.g., optimization-based methods). There also exist approaches with high computational efficiency, but they have provable gaps to the optimal performance (e.g., the $\beta$-optimal approaches in top-two methods). This paper introduces a framework and an algorithm for BAI that achieves optimal performance with a computationally efficient set of decision rules. The central process that facilitates this is a routine for sequentially estimating the optimal allocations up to sufficient fidelity. Specifically, these estimates are accurate enough for identifying the best arm (hence, achieving optimality) but not overly accurate to an unnecessary extent that creates excessive computational complexity (hence, maintaining efficiency). Furthermore, the existing relevant literature focuses on the family of exponential distributions. This paper considers a more general setting of any arbitrary family of distributions parameterized by their mean values (under mild regularity conditions). The optimality is established analytically, and numerical evaluations are provided to assess the analytical guarantees and compare the performance with those of the existing ones.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (28)
  1. “Pure exploration in multi-armed bandits problems,” in Proc. International Conference on Algorithmic Learning Theory, Porto, Portugal, October 2009.
  2. “Best arm identification: A unified approach to fixed budget and fixed confidence,” in Proc. Advances in Neural Information Processing Systems, Lake Tahoe, NV, December 2012.
  3. “PAC subset selection in stochastic multi-armed bandits,” in Proc. International Conference on Machine Learning, Madison, WI, June 2012.
  4. A. Garivier and E. Kaufmann, “Optimal best arm identification with fixed confidence,” in Proc. Conference on Learning Theory, New York, NY, June 2016.
  5. “A fully adaptive algorithm for pure exploration in linear bandits,” in Proc. International Conference on Artificial Intelligence and Statistics, Lanzarote, Canary Islands, April 2018.
  6. “lil’ UCB : An optimal exploration algorithm for multi-armed bandits,” in Proc. Conference on Learning Theory, Barcelona, Spain, June 2014.
  7. A. Mukherjee and A. Tajer, “SPRT-based efficient best arm identification in stochastic bandits,” IEEE Journal on Selected Areas in Information Theory (accepted for publication), June 2023.
  8. “On correlation and budget constraints in model-based bandit optimization with application to automatic machine learning,” in Proc. International Conference on Artificial Intelligence and Statistics, Reykjavik, Iceland, April 2014.
  9. “An empirical process approach to the union bound: Practical algorithms for combinatorial and linear bandits,” in Proc. Advances in Neural Information Processing Systems, Virtual, December 2020.
  10. D. Russo, “Simple bayesian algorithms for best-arm identification,” Operations Research, vol. 68, no. 6, pp. 1625–1647, April 2020.
  11. “Fixed-confidence guarantees for Bayesian best-arm identification,” in Proc. International Conference on Artificial Intelligence and Statistics, Sicily, Italy, August 2020.
  12. “Top two algorithms revisited,” in Proc. Advances in Neural Information Processing Systems, New Orleans, LA, December 2022.
  13. “Optimal δ𝛿\deltaitalic_δ-correct best-arm selection for heavy-tailed distributions,” in Proc. International Conference on Algorithmic Learning Theory, San Diego, CA, February 2020.
  14. Y. Jedra and A. Proutiere, “Optimal best-arm identification in linear bandits,” in Proc. Advances in Neural Information Processing Systems, Virtual, December 2020.
  15. P. Ménard, “Gradient ascent for active exploration in bandit problems,” arXiv 1905.08165, 2019.
  16. “Non-asymptotic pure exploration by solving games,” in Proc. Advances in Neural Information Processing Systems, Vancouver, Canada, December 2019.
  17. “Gamification of pure exploration for linear bandits,” in Proceedings of the 37th International Conference on Machine Learning, Vienna, Austria, July 2020.
  18. “Fast pure exploration via Frank-Wolfe,” in Proc. Advances in Neural Information Processing Systems, virtual, December 2021.
  19. A. Mukherjee and A. Tajer, “SPRT-based best arm identification in stochastic bandits,” in Proc. International Symposium on Information Theory, Espoo, Finland, June 2022.
  20. “Improving the expected improvement algorithm,” in Proc. Advances in Neural Information Processing Systems, Long Beach, CA, December 2017.
  21. E. Kaufmann and W. M. Koolen, “Mixture martingales revisited with applications to sequential tests and confidence intervals,” Journal of Machine Learning Research, vol. 22, no. 246, pp. 1–44, 2021.
  22. T. Lattimore and C. Szepesvári, Bandit Algorithms, Cambridge University Press, Cambridge, UK, 2020.
  23. R. K. Sundaram, A First Course in Optimization Theory, Cambridge University Press, Cambridge, UK, June 1996.
  24. J. Lin, “Divergence measures based on the Shannon entropy,” IEEE Transactions on Information Theory, vol. 37, no. 1, pp. 145–151, January 1991.
  25. E. Posner, “Random coding strategies for minimum entropy,” IEEE Transactions on Information Theory, vol. 21, no. 4, pp. 388–391, July 1975.
  26. Infinite Dimensional Analysis: A Hitchhiker’s Guide, Springer-Verlag, Berlin, Germany, 2006.
  27. Y. S. Chow and H. Teicher, Probability Theory Independence, Interchangeability, Martingales, Springer, 1978.
  28. S. P. Boyd and L. Vandenberghe, Convex Optimization, Cambridge University Press, Cambridge, UK, 2004.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Arpan Mukherjee (20 papers)
  2. Ali Tajer (49 papers)
Citations (3)

Summary

We haven't generated a summary for this paper yet.