Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Suboptimal Performance of the Bayes Optimal Algorithm in Frequentist Best Arm Identification (2202.05193v3)

Published 10 Feb 2022 in stat.ML, cs.LG, and math.PR

Abstract: We consider the fixed-budget best arm identification problem with rewards following normal distributions. In this problem, the forecaster is given $K$ arms (or treatments) and $T$ time steps. The forecaster attempts to find the arm with the largest mean, via an adaptive experiment conducted using an algorithm. The algorithm's performance is evaluated by simple regret, reflecting the quality of the estimated best arm. While frequentist simple regret can decrease exponentially with respect to $T$, Bayesian simple regret decreases polynomially. This paper demonstrates that the Bayes optimal algorithm, which minimizes the Bayesian simple regret, does not yield an exponential decrease in simple regret under certain parameter settings. This contrasts with the numerous findings that suggest the asymptotic equivalence of Bayesian and frequentist approaches in fixed sampling regimes. Although the Bayes optimal algorithm is formulated as a recursive equation that is virtually impossible to compute exactly, we lay the groundwork for future research by introducing a novel concept termed the expected BeLLMan improvement.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (47)
  1. J. Laurie Snell. A conversation with Joe Doob. Statistical Science, 12(4):301 – 311, 1997. doi: 10.1214/ss/1030037961. URL https://doi.org/10.1214/ss/1030037961.
  2. A contextual-bandit approach to personalized news article recommendation. In Proceedings of the 19th International Conference on World Wide Web, WWW ’10, page 661–670. Association for Computing Machinery, 2010. ISBN 9781605587998. doi: 10.1145/1772690.1772758.
  3. Beyond the pareto efficient frontier: Constraint active search for multiobjective experimental design. In Proceedings of the 38th International Conference on Machine Learning, volume 139, pages 7423–7434. PMLR, 2021.
  4. William R Thompson. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 1933.
  5. Multi-armed Bandit Models for the Optimal Design of Clinical Trials: Benefits and Challenges. Statistical Science, 30(2):199 – 215, 2015. doi: 10.1214/14-STS504.
  6. Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics, 6(1):4–22, 1985. ISSN 0196-8858. doi: https://doi.org/10.1016/0196-8858(85)90002-8.
  7. Finite-time analysis of the multiarmed bandit problem. Machine Learning, 47:235–256, 05 2002. doi: 10.1023/A:1013689704352.
  8. Pure Exploration in Multi-armed Bandits Problems. In Algorithmic Learning Theory, 20th International Conference, ALT 2009, Porto, Portugal, October 3-5, 2009. Proceedings, volume 5809 of Lecture Notes in Computer Science, pages 23–37. Springer, 2009. doi: 10.1007/978-3-642-04414-4˙7.
  9. Best arm identification in multi-armed bandits. In COLT 2010 - The 23rd Conference on Learning Theory, pages 41–53. Omnipress, 2010.
  10. Robert E. Bechhofer. A Single-Sample Multiple Decision Procedure for Ranking Means of Normal Populations with known Variances. The Annals of Mathematical Statistics, 25(1):16 – 39, 1954. doi: 10.1214/aoms/1177728845.
  11. Daniel Russo. Simple bayesian algorithms for best-arm identification. Oper. Res., 68(6):1625–1647, 2020. doi: 10.1287/opre.2019.1911.
  12. Review on ranking and selection: A new perspective. Frontiers of Engineering Management, 8(3):321–343, Sep 2021.
  13. Almost optimal exploration in multi-armed bandits. In International Conference on Machine Learning, Proceedings of Machine Learning Research, pages 1238–1246. PMLR, 17–19 Jun 2013.
  14. Bayesian look ahead one stage sampling allocations for selecting the largest normal mean. Statistical Papers, 35:169–177, 1994.
  15. Efficient global optimization of expensive black-box functions. Journal of Global Optimization, 13(4):455–492, Dec 1998.
  16. Optimal Learning. Wiley Series in Probability and Statistics. Wiley, 2013. ISBN 9781118309841.
  17. Fixed-confidence guarantees for bayesian best-arm identification. In The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020, 26-28 August 2020, volume 108 of Proceedings of Machine Learning Research, pages 1823–1832. PMLR, 2020.
  18. Minimax optimal algorithms for fixed-budget best arm identification. In Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022, 2022.
  19. Optimal simple regret in bayesian best arm identification. CoRR, abs/2111.09885, 2021.
  20. Tight (lower) bounds for the fixed budget best arm identification bandit problem. In Conference on Learning Theory, 2016.
  21. A knowledge-gradient policy for sequential information collection. SIAM J. Control Optim., 47(5):2410–2439, sep 2008.
  22. Adam D. Bull. Convergence rates of efficient global optimization algorithms. Journal of Machine Learning Research, 12(88):2879–2904, 2011.
  23. Herbert Robbins. Some aspects of the sequential design of experiments. Bulletin of the American Mathematical Society, 1952.
  24. Pure exploration in finitely-armed and continuous-armed bandits. Theoretical Computer Science, 412(19):1832–1852, 2011.
  25. Edward Paulson. A Sequential Procedure for Selecting the Population with the Largest Mean from k𝑘kitalic_k Normal Populations. The Annals of Mathematical Statistics, 1964.
  26. The racing algorithm: Model selection for lazy learners. Artificial Intelligence Review, 11(1):193–225, Feb 1997. ISSN 1573-7462.
  27. Action elimination and stopping conditions for the multi-armed bandit and reinforcement learning problems. Journal of machine learning research, 2006.
  28. Optimal best arm identification with fixed confidence. In Conference on Learning Theory, Proceedings of Machine Learning Research, 2016.
  29. Top two algorithms revisited. In Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022, 2022.
  30. Information-directed selection for top-two algorithms. In The Thirty Sixth Annual Conference on Learning Theory, COLT 2023, 12-15 July 2023, Bangalore, India, volume 195 of Proceedings of Machine Learning Research, pages 2850–2851. PMLR, 2023.
  31. Rémy Degenne. On the existence of a complexity in fixed budget bandit identification. In The Thirty Sixth Annual Conference on Learning Theory, COLT 2023, 12-15 July 2023, Bangalore, India, volume 195 of Proceedings of Machine Learning Research, pages 1131–1154. PMLR, 2023.
  32. Simulation budget allocation for further enhancing theefficiency of ordinal optimization. Discrete Event Dynamic Systems, 10(3):251–270, July 2000.
  33. A large deviations perspective on ordinal optimization. In Winter Simulation Conference, volume 1. IEEE, 2004.
  34. New two-stage and sequential procedures for selecting the best simulated system. Operations Research, 49(5):732–743, 2001.
  35. The knowledge gradient algorithm for a general class of online learning problems. Oper. Res., 60(1):180–195, 2012. doi: 10.1287/opre.1110.0999.
  36. Finite-time analysis for the knowledge-gradient policy. SIAM J. Control. Optim., 56(2):1105–1129, 2018. doi: 10.1137/16M1073388.
  37. Ilya O. Ryzhov. On the convergence rates of expected improvement methods. Operations Research, 64(6):1515–1528, 2016.
  38. Improving the expected improvement algorithm. In Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017.
  39. Ye Chen and Ilya O. Ryzhov. Complete expected improvement converges to an optimal budget allocation. Advances in Applied Probability, 51(1):209–235, 2019. doi: 10.1017/apr.2019.9.
  40. Analysis of thompson sampling for the multi-armed bandit problem. In Conference on Learning Theory, volume 23 of JMLR Proceedings, pages 39.1–39.26. JMLR.org, 2012.
  41. Thompson sampling: An asymptotically optimal finite-time analysis. In Algorithmic Learning Theory - 23rd International Conference, ALT 2012, volume 7568, pages 199–213. Springer, 2012.
  42. Optimal regret analysis of thompson sampling in stochastic multi-armed bandit problem with multiple plays. In Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, volume 37, pages 1152–1161, 2015.
  43. J. C. Gittins. Multi-armed Bandit Allocation Indices. Wiley, Chichester, NY, 1989.
  44. Richard Weber. On the gittins index for multiarmed bandits. The Annals of Applied Probability, pages 1024–1033, 1992.
  45. Bandit problems: Sequential allocation of experiments. 1985.
  46. Emilie Kaufmann. Analysis of Bayesian and Frequentist strategies for Sequential Resource Allocation. PhD Thesis, TELECOM ParisTech, 2014. URL http://chercheurs.lille.inria.fr/ekaufman/TheseEmilie.pdf.
  47. William Feller. An Introduction to Probability Theory and Its Applications., volume 1 of Third edition. John Wiley & Sons Inc., New York, 1968.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets