Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
158 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Fixed-Budget Best-Arm Identification with Heterogeneous Reward Variances (2306.07549v1)

Published 13 Jun 2023 in cs.LG and stat.ML

Abstract: We study the problem of best-arm identification (BAI) in the fixed-budget setting with heterogeneous reward variances. We propose two variance-adaptive BAI algorithms for this setting: SHVar for known reward variances and SHAdaVar for unknown reward variances. Our algorithms rely on non-uniform budget allocations among the arms where the arms with higher reward variances are pulled more often than those with lower variances. The main algorithmic novelty is in the design of SHAdaVar, which allocates budget greedily based on overestimating the unknown reward variances. We bound probabilities of misidentifying the best arms in both SHVar and SHAdaVar. Our analyses rely on novel lower bounds on the number of pulls of an arm that do not require closed-form solutions to the budget allocation problem. Since one of our budget allocation problems is analogous to the optimal experiment design with unknown variances, we believe that our results are of a broad interest. Our experiments validate our theory, and show that SHVar and SHAdaVar outperform algorithms from prior works with analytical guarantees.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (26)
  1. Best arm identification in multi-armed bandits. In Proceeding of the 23rd Annual Conference on Learning Theory, pages 41–53, 2010.
  2. UCB revisited: Improved regret bounds for the stochastic multi-armed bandit problem. Periodica Mathematica Hungarica, 61(1-2):55–65, 2010.
  3. Concentration Inequalities: A Nonasymptotic Theory of Independence. Oxford University Press, 2013.
  4. Pure exploration in multi-armed bandits problems. In Proceedings of the 20th International Conference on Algorithmic Learning Theory, pages 23–37, 2009.
  5. Tight (lower) bounds for the fixed budget best arm identification bandit problem. In Proceeding of the 29th Annual Conference on Learning Theory, pages 590–604, 2016.
  6. An overview of low-rank matrix recovery from incomplete observations. IEEE Journal of Selected Topics in Signal Processing, 10(4):608–622, 2016.
  7. Action elimination and stopping conditions for the multi-armed bandit and reinforcement learning problems. Journal of Machine Learning Research, 7:1079–1105, 2006.
  8. Rapidly finding the best arm using variance. In Proceedings of the 24th European Conference on Artificial Intelligence, 2020.
  9. Multi-bandit best arm identification. In Advances in Neural Information Processing Systems 24, 2011.
  10. Best arm identification: A unified approach to fixed budget and fixed confidence. In Advances in Neural Information Processing Systems 25, 2012.
  11. Dealing with unknown variances in best-arm identification. CoRR, abs/2210.00974, 2022. URL https://arxiv.org/abs/2210.00974.
  12. Almost optimal exploration in multi-armed bandits. In Proceedings of the 30th International Conference on Machine Learning, pages 1238–1246, 2013.
  13. On the complexity of best-arm identification in multi-armed bandit models. Journal of Machine Learning Research, 17(1):1–42, 2016.
  14. Near-optimal sensor placements in Gaussian processes: Theory, efficient algorithms and empirical studies. Journal of Machine Learning Research, 9:235–284, 2008.
  15. MovieLens Dataset. http://grouplens.org/datasets/movielens/, 2016.
  16. Bandit Algorithms. Cambridge University Press, 2019.
  17. B. Laurent and P. Massart. Adaptive estimation of a quadratic functional by model selection. The Annals of Statistics, 28(5):1302–1338, 2000.
  18. A contextual-bandit approach to personalized news article recommendation. In Proceedings of the 19th International Conference on World Wide Web, 2010.
  19. Hyperband: A novel bandit-based approach to hyperparameter optimization. Journal of Machine Learning Research, 18(185):1–52, 2018.
  20. Variance-dependent best arm identification. In Proceedings of the 37th Conference on Uncertainty in Artificial Intelligence, 2021.
  21. Empirical Bernstein bounds and sample-variance penalization. In Proceedings of the 22nd Annual Conference on Learning Theory, 2009.
  22. Friedrich Pukelsheim. Optimal Design of Experiments. John Wiley & Sons, 1993.
  23. Best-arm identification in linear bandits. In Advances in Neural Information Processing Systems 27, pages 828–836, 2014.
  24. Efficient learning in large-scale combinatorial semi-bandits. In Proceedings of the 32nd International Conference on Machine Learning, 2015.
  25. Approximate top-m𝑚mitalic_m arm identification with heterogeneous reward variances. In Proceedings of the 25th International Conference on Artificial Intelligence and Statistics, 2022.
  26. Cascading bandits for large-scale recommendation problems. In Proceedings of the 32nd Conference on Uncertainty in Artificial Intelligence, 2016.
Citations (5)

Summary

We haven't generated a summary for this paper yet.