Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 165 tok/s
Gemini 2.5 Pro 50 tok/s Pro
GPT-5 Medium 38 tok/s Pro
GPT-5 High 39 tok/s Pro
GPT-4o 111 tok/s Pro
Kimi K2 188 tok/s Pro
GPT OSS 120B 450 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

Transfer in Sequential Multi-armed Bandits via Reward Samples (2403.12428v1)

Published 19 Mar 2024 in cs.LG and stat.ML

Abstract: We consider a sequential stochastic multi-armed bandit problem where the agent interacts with bandit over multiple episodes. The reward distribution of the arms remain constant throughout an episode but can change over different episodes. We propose an algorithm based on UCB to transfer the reward samples from the previous episodes and improve the cumulative regret performance over all the episodes. We provide regret analysis and empirical results for our algorithm, which show significant improvement over the standard UCB algorithm without transfer.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (17)
  1. S. Bubeck, N. Cesa-Bianchi, et al., “Regret analysis of stochastic and nonstochastic multi-armed bandit problems,” Foundations and Trends® in Machine Learning, vol. 5, no. 1, pp. 1–122, 2012.
  2. Cambridge University Press, 2020.
  3. H. Robbins, “Some aspects of the sequential design of experiments,” 1952.
  4. D. Bouneffouf, I. Rish, and C. Aggarwal, “Survey on applications of multi-armed and contextual bandits,” in 2020 IEEE Congress on Evolutionary Computation (CEC), pp. 1–8, 2020.
  5. N. Silva, H. Werneck, T. Silva, A. C. Pereira, and L. Rocha, “Multi-armed bandits in recommendation systems: A survey of the state-of-the-art and future directions,” Expert Systems with Applications, vol. 197, p. 116669, 2022.
  6. A. Lazaric, E. Brunskill, et al., “Sequential transfer in multi-armed bandit with finite set of models,” Advances in Neural Information Processing Systems, vol. 26, 2013.
  7. A. Shilton, S. Gupta, S. Rana, and S. Venkatesh, “Regret bounds for transfer learning in bayesian optimisation,” in Artificial Intelligence and Statistics, pp. 307–315, PMLR, 2017.
  8. P. Auer, N. Cesa-Bianchi, and P. Fischer, “Finite-time analysis of the multiarmed bandit problem,” Machine learning, vol. 47, pp. 235–256, 2002.
  9. J. Zhang and E. Bareinboim, “Transfer learning in multi-armed bandit: a causal approach,” in Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems, pp. 1778–1780, 2017.
  10. A. A. Deshmukh, U. Dogan, and C. Scott, “Multi-task learning for contextual bandits,” Advances in Neural Information Processing Systems, vol. 30, 2017.
  11. B. Liu, Y. Wei, Y. Zhang, Z. Yan, and Q. Yang, “Transferable contextual bandit for cross-domain recommendation,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32, 2018.
  12. L. Cella, A. Lazaric, and M. Pontil, “Meta-learning with stochastic linear bandits,” in International Conference on Machine Learning, pp. 1360–1370, PMLR, 2020.
  13. L. Cella and M. Pontil, “Multi-task and meta-learning with sparse linear bandits,” in Uncertainty in Artificial Intelligence, pp. 1692–1702, PMLR, 2021.
  14. J. Azizi, B. Kveton, M. Ghavamzadeh, and S. Katariya, “Meta-learning for simple regret minimization,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, pp. 6709–6717, 2023.
  15. A. Garivier and E. Moulines, “On upper-confidence bound policies for switching bandit problems,” in International Conference on Algorithmic Learning Theory, pp. 174–188, Springer, 2011.
  16. W. Hoeffding, “Probability inequalities for sums of bounded random variables,” The collected works of Wassily Hoeffding, pp. 409–426, 1994.
  17. C. McDiarmid et al., “On the method of bounded differences,” Surveys in combinatorics, vol. 141, no. 1, pp. 148–188, 1989.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 2 tweets and received 2 likes.

Upgrade to Pro to view all of the tweets about this paper: