Transfer in Sequential Multi-armed Bandits via Reward Samples (2403.12428v1)

Published 19 Mar 2024 in cs.LG and stat.ML

Abstract: We consider a sequential stochastic multi-armed bandit problem where the agent interacts with bandit over multiple episodes. The reward distribution of the arms remain constant throughout an episode but can change over different episodes. We propose an algorithm based on UCB to transfer the reward samples from the previous episodes and improve the cumulative regret performance over all the episodes. We provide regret analysis and empirical results for our algorithm, which show significant improvement over the standard UCB algorithm without transfer.

References (17)

Citations (1)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Related Papers

Regional Multi-Armed Bandits (2018)
Multi-armed Bandit Algorithm against Strategic Replication (2021)
Lifelong Learning in Multi-Armed Bandits (2020)
Bayesian Algorithms for Decentralized Stochastic Bandits (2020)
Multi-armed Bandit Problem with Known Trend (2015)

Tweets

https://twitter.com/StatMLPapers/status/1770300286394843617