Papers
Topics
Authors
Recent
2000 character limit reached

Truncated LinUCB for Stochastic Linear Bandits (2202.11735v4)

Published 23 Feb 2022 in stat.ML, cs.LG, math.ST, and stat.TH

Abstract: This paper considers contextual bandits with a finite number of arms, where the contexts are independent and identically distributed $d$-dimensional random vectors, and the expected rewards are linear in both the arm parameters and contexts. The LinUCB algorithm, which is near minimax optimal for related linear bandits, is shown to have a cumulative regret that is suboptimal in both the dimension $d$ and time horizon $T$, due to its over-exploration. A truncated version of LinUCB is proposed and termed "Tr-LinUCB", which follows LinUCB up to a truncation time $S$ and performs pure exploitation afterwards. The Tr-LinUCB algorithm is shown to achieve $O(d\log(T))$ regret if $S = Cd\log(T)$ for a sufficiently large constant $C$, and a matching lower bound is established, which shows the rate optimality of Tr-LinUCB in both $d$ and $T$ under a low dimensional regime. Further, if $S = d\log{\kappa}(T)$ for some $\kappa>1$, the loss compared to the optimal is a multiplicative $\log\log(T)$ factor, which does not depend on $d$. This insensitivity to overshooting in choosing the truncation time of Tr-LinUCB is of practical importance.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (2)

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.