Restless Linear Bandits (2405.10817v1)
Abstract: A more general formulation of the linear bandit problem is considered to allow for dependencies over time. Specifically, it is assumed that there exists an unknown $\mathbb{R}d$-valued stationary $\varphi$-mixing sequence of parameters $(\theta_t,~t \in \mathbb{N})$ which gives rise to pay-offs. This instance of the problem can be viewed as a generalization of both the classical linear bandits with iid noise, and the finite-armed restless bandits. In light of the well-known computational hardness of optimal policies for restless bandits, an approximation is proposed whose error is shown to be controlled by the $\varphi$-dependence between consecutive $\theta_t$. An optimistic algorithm, called LinMix-UCB, is proposed for the case where $\theta_t$ has an exponential mixing rate. The proposed algorithm is shown to incur a sub-linear regret of $\mathcal{O}\left(\sqrt{d n\mathrm{polylog}(n) }\right)$ with respect to an oracle that always plays a multiple of $\mathbb{E}\theta_t$. The main challenge in this setting is to ensure that the exploration-exploitation strategy is robust against long-range dependencies. The proposed method relies on Berbee's coupling lemma to carefully select near-independent samples and construct confidence ellipsoids around empirical estimates of $\mathbb{E}\theta_t$.
- P. Auer, “Using confidence bounds for exploitation-exploration trade-offs,” Journal of Machine Learning Research, vol. 3, no. Nov, pp. 397–422, 2002.
- Y. Abbasi-yadkori, D. Pál, and C. Szepesvári, “Improved algorithms for linear stochastic bandits,” in Advances in Neural Information Processing Systems, vol. 24, 2011.
- S. Bubeck and N. Cesa-Bianchi, “Regret analysis of stochastic and nonstochastic multi-armed bandit problems,” Foundations and Trends in Machine Learning, vol. 5, no. 1, pp. 1–122, 2012.
- R. Ortner, D. Ryabko, P. Auer, and R. Munos, “Regret bounds for restless markov bandits,” Theoretical Computer Science, vol. 558, pp. 62–76, 2014.
- S. Grünewälder and A. Khaleghi, “Approximations of the restless bandit problem,” The Journal of Machine Learning Research, vol. 20, no. 1, pp. 514–550, 2019.
- C. H. Papadimitriou and J. N. Tsitsiklis, “The complexity of optimal queuing network control,” Mathematics of Operations Research, vol. 24, no. 2, pp. 293–305, 1999.
- Q. Chen, N. Golrezaei, and D. Bouneffouf, “Non-stationary bandits with auto-regressive temporal dependency,” Advances in Neural Information Processing Systems, vol. 36, pp. 7895–7929, 2023.
- H. C. Berbee, “Random walks with stationary increments and renewal theory,” Mathematisch Centrum, 1979.
- S. Grünewälder and A. Khaleghi, “Estimating the mixing coefficients of geometrically ergodic markov processes,” arXiv preprint arXiv:2402.07296, 2024.
- A. Khaleghi and G. Lugosi, “Inferring the mixing properties of a stationary ergodic process from a single sample-path,” IEEE Transactions on Information Theory, 2023.
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.