Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 158 tok/s
Gemini 2.5 Pro 47 tok/s Pro
GPT-5 Medium 29 tok/s Pro
GPT-5 High 29 tok/s Pro
GPT-4o 117 tok/s Pro
Kimi K2 182 tok/s Pro
GPT OSS 120B 439 tok/s Pro
Claude Sonnet 4.5 38 tok/s Pro
2000 character limit reached

Stochastic contextual bandits with graph feedback: from independence number to MAS number (2402.18591v2)

Published 12 Feb 2024 in cs.LG, cs.GT, math.ST, and stat.TH

Abstract: We consider contextual bandits with graph feedback, a class of interactive learning problems with richer structures than vanilla contextual bandits, where taking an action reveals the rewards for all neighboring actions in the feedback graph under all contexts. Unlike the multi-armed bandits setting where a growing literature has painted a near-complete understanding of graph feedback, much remains unexplored in the contextual bandits counterpart. In this paper, we make inroads into this inquiry by establishing a regret lower bound $\Omega(\sqrt{\beta_M(G) T})$, where $M$ is the number of contexts, $G$ is the feedback graph, and $\beta_M(G)$ is our proposed graph-theoretic quantity that characterizes the fundamental learning limit for this class of problems. Interestingly, $\beta_M(G)$ interpolates between $\alpha(G)$ (the independence number of the graph) and $\mathsf{m}(G)$ (the maximum acyclic subgraph (MAS) number of the graph) as the number of contexts $M$ varies. We also provide algorithms that achieve near-optimal regret for important classes of context sequences and/or feedback graphs, such as transitively closed graphs that find applications in auctions and inventory control. In particular, with many contexts, our results show that the MAS number essentially characterizes the statistical complexity for contextual bandits, as opposed to the independence number in multi-armed bandits.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (27)
  1. Online learning with feedback graphs: Beyond bandits. In Conference on Learning Theory, pages 23–35. PMLR, 2015.
  2. Nonstochastic multi-armed bandits with graph-structured feedback. SIAM Journal on Computing, 46(6):1785–1826, 2017.
  3. Contextual bandit learning with predictable rewards. In Artificial Intelligence and Statistics, pages 19–26. PMLR, 2012.
  4. The probabilistic method. John Wiley & Sons, 2016.
  5. Partial monitoring—classification, regret bounds, and algorithms. Mathematics of Operations Research, 39(4):967–997, 2014.
  6. Contextual bandits with cross-learning. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019.
  7. Survey on applications of multi-armed and contextual bandits. In 2020 IEEE Congress on Evolutionary Computation (CEC), pages 1–8. IEEE, 2020.
  8. Vasek Chvatal. A greedy heuristic for the set-covering problem. Mathematics of operations research, 4(3):233–235, 1979.
  9. Reinforcement learning with feedback graphs. Advances in Neural Information Processing Systems, 33:16868–16878, 2020.
  10. Action elimination and stopping conditions for the multi-armed bandit and reinforcement learning problems. Journal of machine learning research, 7(6), 2006.
  11. On the complexity of multi-agent decision making: From learning in games to partial monitoring. In The Thirty Sixth Annual Conference on Learning Theory, pages 2678–2792. PMLR, 2023.
  12. Tight guarantees for interactive decision making with the decision-estimation coefficient. arXiv preprint arXiv:2301.08215, 2023.
  13. Approximating clique is almost np-complete. In [1991] Proceedings 32nd Annual Symposium of Foundations of Computer Science, pages 2–12. IEEE Computer Society, 1991.
  14. The statistical complexity of interactive decision making. arXiv preprint arXiv:2112.13487, 2021.
  15. Batched multi-armed bandits problem. Advances in Neural Information Processing Systems, 32, 2019.
  16. Fabrizio Grandoni. A note on the complexity of minimum dominating set. Journal of Discrete Algorithms, 4(2):209–214, 2006.
  17. A nonparametric asymptotic analysis of inventory planning with censored demand. Mathematics of Operations Research, 34(1):103–123, 2009.
  18. Optimal no-regret learning in repeated first-price auctions. arXiv preprint arXiv:2003.09795, 2020.
  19. Richard M Karp. Reducibility among combinatorial problems. Springer, 2010.
  20. Online learning with feedback graphs: The true shape of regret. In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett, editors, Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pages 17260–17282. PMLR, 23–29 Jul 2023.
  21. Tor Lattimore. Minimax regret for partial monitoring: Infinite outcomes and rustichini’s regret. In Conference on Learning Theory, pages 1547–1575. PMLR, 2022.
  22. Feedback graph regret bounds for thompson sampling and ucb. In Aryeh Kontorovich and Gergely Neu, editors, Proceedings of the 31st International Conference on Algorithmic Learning Theory, volume 117 of Proceedings of Machine Learning Research, pages 592–614. PMLR, 08 Feb–11 Feb 2020.
  23. From bandits to experts: On the value of side-observations. In J. Shawe-Taylor, R. Zemel, P. Bartlett, F. Pereira, and K.Q. Weinberger, editors, Advances in Neural Information Processing Systems, volume 24. Curran Associates, Inc., 2011.
  24. Optimal cross-learning for contextual bandits with unknown context distributions. arXiv preprint arXiv:2401.01857, 2024.
  25. Stochastic one-sided full-information bandit. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 150–166. Springer, 2019.
  26. Contextual bandits with smooth regret: Efficient learning in continuous action spaces. In International Conference on Machine Learning, pages 27574–27590. PMLR, 2022.
  27. Practical contextual bandits with feedback graphs. arXiv e-prints, pages arXiv–2302, 2023.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.