Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 144 tok/s
Gemini 2.5 Pro 48 tok/s Pro
GPT-5 Medium 21 tok/s Pro
GPT-5 High 23 tok/s Pro
GPT-4o 99 tok/s Pro
Kimi K2 197 tok/s Pro
GPT OSS 120B 428 tok/s Pro
Claude Sonnet 4.5 36 tok/s Pro
2000 character limit reached

Chained Information-Theoretic bounds and Tight Regret Rate for Linear Bandit Problems (2403.03361v1)

Published 5 Mar 2024 in stat.ML and cs.LG

Abstract: This paper studies the Bayesian regret of a variant of the Thompson-Sampling algorithm for bandit problems. It builds upon the information-theoretic framework of [Russo and Van Roy, 2015] and, more specifically, on the rate-distortion analysis from [Dong and Van Roy, 2020], where they proved a bound with regret rate of $O(d\sqrt{T \log(T)})$ for the $d$-dimensional linear bandit setting. We focus on bandit problems with a metric action space and, using a chaining argument, we establish new bounds that depend on the metric entropy of the action space for a variant of Thompson-Sampling. Under suitable continuity assumption of the rewards, our bound offers a tight rate of $O(d\sqrt{T})$ for $d$-dimensional linear bandit problems.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (10)
  1. D. Russo and B. Van Roy, “An Information-Theoretic Analysis of Thompson Sampling,” Jun. 2015, number: arXiv:1403.5341 arXiv:1403.5341 [cs]. [Online]. Available: http://arxiv.org/abs/1403.5341
  2. S. Dong and B. Van Roy, “An Information-Theoretic Analysis for Thompson Sampling with Many Actions,” Jul. 2020, arXiv:1805.11845 [cs, math, stat]. [Online]. Available: http://arxiv.org/abs/1805.11845
  3. W. R. Thompson, “On the likelihood that one unknown probability exceeds another in view of the evidence of two samples,” Biometrika, vol. 25, no. 3-4, pp. 285–294, 1933.
  4. D. J. Russo, B. Van Roy, A. Kazerouni, I. Osband, Z. Wen et al., “A tutorial on thompson sampling,” Foundations and Trends® in Machine Learning, vol. 11, no. 1, pp. 1–96, 2018.
  5. D. Russo and B. Van Roy, “Learning to Optimize via Information-Directed Sampling,” Jul. 2017, arXiv:1403.5556 [cs]. [Online]. Available: http://arxiv.org/abs/1403.5556
  6. O. Chapelle and L. Li, “An empirical evaluation of Thompson sampling,” Advances in neural information processing systems, vol. 24, 2011.
  7. V. Dani, T. P. Hayes, and S. M. Kakade, “Stochastic Linear Optimization under Bandit Feedback,” 21st Annual Conference on Learning Theory, vol. 21st Annual Conference on Learning Theory, pp. 355–366, 2008.
  8. G. Neu, I. Olkhovskaia, M. Papini, and L. Schwartz, “Lifting the Information Ratio: An Information-Theoretic Analysis of Thompson Sampling for Contextual Bandits,” Advances in Neural Information Processing Systems, vol. 35, pp. 9486–9498, 2022.
  9. A. Gouverneur, B. Rodríguez-Gálvez, T. J. Oechtering, and M. Skoglund, “Thompson Sampling Regret Bounds for Contextual Bandits with sub-Gaussian rewards,” Apr. 2023, arXiv:2304.13593 [cs, stat]. [Online]. Available: http://arxiv.org/abs/2304.13593
  10. J. Negrea, M. Haghifam, G. K. Dziugaite, A. Khisti, and D. M. Roy, “Information-Theoretic Generalization Bounds for SGLD via Data-Dependent Estimates,” arXiv:1911.02151 [cs, math, stat], Jan. 2020, arXiv: 1911.02151. [Online]. Available: http://arxiv.org/abs/1911.02151

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Questions

We haven't generated a list of open questions mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 1 tweet and received 1 like.

Upgrade to Pro to view all of the tweets about this paper: