Emergent Mind

Efficient Change-Point Detection for Tackling Piecewise-Stationary Bandits

(1902.01575)
Published Feb 5, 2019 in stat.ML , cs.LG , math.ST , and stat.TH

Abstract

We introduce GLR-klUCB, a novel algorithm for the piecewise iid non-stationary bandit problem with bounded rewards. This algorithm combines an efficient bandit algorithm, kl-UCB, with an efficient, parameter-free, changepoint detector, the Bernoulli Generalized Likelihood Ratio Test, for which we provide new theoretical guarantees of independent interest. Unlike previous non-stationary bandit algorithms using a change-point detector, GLR-klUCB does not need to be calibrated based on prior knowledge on the arms' means. We prove that this algorithm can attain a $O(\sqrt{TA \UpsilonT\log(T)})$ regret in $T$ rounds on some "easy" instances, where A is the number of arms and $\UpsilonT$ the number of change-points, without prior knowledge of $\UpsilonT$. In contrast with recently proposed algorithms that are agnostic to $\UpsilonT$, we perform a numerical study showing that GLR-klUCB is also very efficient in practice, beyond easy instances.

We're not able to analyze this paper right now due to high demand.

Please check back later (sorry!).

Generate a summary of this paper on our Pro plan:

We ran into a problem analyzing this paper.

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.