Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
149 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Graph Feedback Bandits with Similar Arms (2405.11171v1)

Published 18 May 2024 in cs.LG

Abstract: In this paper, we study the stochastic multi-armed bandit problem with graph feedback. Motivated by the clinical trials and recommendation problem, we assume that two arms are connected if and only if they are similar (i.e., their means are close enough). We establish a regret lower bound for this novel feedback structure and introduce two UCB-based algorithms: D-UCB with problem-independent regret upper bounds and C-UCB with problem-dependent upper bounds. Leveraging the similarity structure, we also consider the scenario where the number of arms increases over time. Practical applications related to this scenario include Q&A platforms (Reddit, Stack Overflow, Quora) and product reviews in Amazon and Flipkart. Answers (product reviews) continually appear on the website, and the goal is to display the best answers (product reviews) at the top. When the means of arms are independently generated from some distribution, we provide regret upper bounds for both algorithms and discuss the sub-linearity of bounds in relation to the distribution of means. Finally, we conduct experiments to validate the theoretical results.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (28)
  1. Handbook of mathematical functions with formulas, graphs, and mathematical tables, 1988.
  2. On domination and independent domination numbers of a graph. Discrete mathematics, 23(2):73–76, 1978.
  3. Nonstochastic multi-armed bandits with graph-structured feedback. SIAM Journal on Computing, 46(6):1785–1826, 2017.
  4. Ucb revisited: Improved regret bounds for the stochastic multi-armed bandit problem. Periodica Mathematica Hungarica, 61(1-2):55–65, 2010.
  5. Finite-time analysis of the multiarmed bandit problem. Machine learning, 47:235–256, 2002a.
  6. The nonstochastic multiarmed bandit problem. SIAM journal on computing, 32(1):48–77, 2002b.
  7. On multi-armed bandit designs for dose-finding clinical trials. The Journal of Machine Learning Research, 22(1):686–723, 2021.
  8. Stochastic bandits with side observations on networks. In The 2014 ACM international conference on Measurement and modeling of computer systems, pages 289–300, 2014.
  9. Reward maximization under uncertainty: Leveraging side-observations on networks. Journal of Machine Learning Research, 18(216):1–34, 2018.
  10. Leveraging side observations in stochastic bandits. arXiv preprint arXiv:1210.4839, 2012.
  11. Mortal multi-armed bandits. Advances in neural information processing systems, 21, 2008.
  12. Online learning with feedback graphs without the graphs. In International Conference on Machine Learning, pages 811–819. PMLR, 2016.
  13. Learning on the edge: Online learning with stochastic feedback graphs. Advances in Neural Information Processing Systems, 35:34776–34788, 2022.
  14. Action elimination and stopping conditions for the multi-armed bandit and reinforcement learning problems. Journal of machine learning research, 7(6), 2006.
  15. Online clustering of bandits. In International conference on machine learning, pages 757–765. PMLR, 2014.
  16. Ballooning multi-armed bandits. Artificial Intelligence, 296:103485, 2021.
  17. Online learning with probabilistic feedback. In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 4183–4187. IEEE, 2022.
  18. Problem-dependent regret bounds for online learning with feedback graphs. In Uncertainty in Artificial Intelligence, pages 852–861. PMLR, 2020.
  19. Asymptotically efficient adaptive allocation rules. Advances in applied mathematics, 6(1):4–22, 1985.
  20. Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms. In Proceedings of the fourth ACM international conference on Web search and data mining, pages 297–306, 2011.
  21. Shuai Li et al. The art of clustering bandits. 2016.
  22. Feedback graph regret bounds for thompson sampling and ucb. In Algorithmic Learning Theory, pages 592–614. PMLR, 2020.
  23. From bandits to experts: On the value of side-observations. Advances in Neural Information Processing Systems, 24, 2011.
  24. Adaptivebandit: a multi-armed bandit framework for adaptive sampling in molecular simulations. Journal of chemical theory and computation, 16(7):4685–4693, 2020.
  25. Customer acquisition via display advertising using multi-armed bandit experiments. Marketing Science, 36(4):500–522, 2017.
  26. Multi-armed bandit models for the optimal design of clinical trials: benefits and challenges. Statistical science: a review journal of the Institute of Mathematical Statistics, 30(2):199, 2015.
  27. Online clustering of bandits with misspecified user models. Advances in Neural Information Processing Systems, 36, 2024.
  28. Optimal clustering with bandit feedback. arXiv preprint arXiv:2202.04294, 2022.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com