Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
166 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Chained Information-Theoretic bounds and Tight Regret Rate for Linear Bandit Problems (2403.03361v1)

Published 5 Mar 2024 in stat.ML and cs.LG

Abstract: This paper studies the Bayesian regret of a variant of the Thompson-Sampling algorithm for bandit problems. It builds upon the information-theoretic framework of [Russo and Van Roy, 2015] and, more specifically, on the rate-distortion analysis from [Dong and Van Roy, 2020], where they proved a bound with regret rate of $O(d\sqrt{T \log(T)})$ for the $d$-dimensional linear bandit setting. We focus on bandit problems with a metric action space and, using a chaining argument, we establish new bounds that depend on the metric entropy of the action space for a variant of Thompson-Sampling. Under suitable continuity assumption of the rewards, our bound offers a tight rate of $O(d\sqrt{T})$ for $d$-dimensional linear bandit problems.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (10)
  1. D. Russo and B. Van Roy, “An Information-Theoretic Analysis of Thompson Sampling,” Jun. 2015, number: arXiv:1403.5341 arXiv:1403.5341 [cs]. [Online]. Available: http://arxiv.org/abs/1403.5341
  2. S. Dong and B. Van Roy, “An Information-Theoretic Analysis for Thompson Sampling with Many Actions,” Jul. 2020, arXiv:1805.11845 [cs, math, stat]. [Online]. Available: http://arxiv.org/abs/1805.11845
  3. W. R. Thompson, “On the likelihood that one unknown probability exceeds another in view of the evidence of two samples,” Biometrika, vol. 25, no. 3-4, pp. 285–294, 1933.
  4. D. J. Russo, B. Van Roy, A. Kazerouni, I. Osband, Z. Wen et al., “A tutorial on thompson sampling,” Foundations and Trends® in Machine Learning, vol. 11, no. 1, pp. 1–96, 2018.
  5. D. Russo and B. Van Roy, “Learning to Optimize via Information-Directed Sampling,” Jul. 2017, arXiv:1403.5556 [cs]. [Online]. Available: http://arxiv.org/abs/1403.5556
  6. O. Chapelle and L. Li, “An empirical evaluation of Thompson sampling,” Advances in neural information processing systems, vol. 24, 2011.
  7. V. Dani, T. P. Hayes, and S. M. Kakade, “Stochastic Linear Optimization under Bandit Feedback,” 21st Annual Conference on Learning Theory, vol. 21st Annual Conference on Learning Theory, pp. 355–366, 2008.
  8. G. Neu, I. Olkhovskaia, M. Papini, and L. Schwartz, “Lifting the Information Ratio: An Information-Theoretic Analysis of Thompson Sampling for Contextual Bandits,” Advances in Neural Information Processing Systems, vol. 35, pp. 9486–9498, 2022.
  9. A. Gouverneur, B. Rodríguez-Gálvez, T. J. Oechtering, and M. Skoglund, “Thompson Sampling Regret Bounds for Contextual Bandits with sub-Gaussian rewards,” Apr. 2023, arXiv:2304.13593 [cs, stat]. [Online]. Available: http://arxiv.org/abs/2304.13593
  10. J. Negrea, M. Haghifam, G. K. Dziugaite, A. Khisti, and D. M. Roy, “Information-Theoretic Generalization Bounds for SGLD via Data-Dependent Estimates,” arXiv:1911.02151 [cs, math, stat], Jan. 2020, arXiv: 1911.02151. [Online]. Available: http://arxiv.org/abs/1911.02151

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets