Emergent Mind

Variance-Dependent Best Arm Identification

(2106.10417)
Published Jun 19, 2021 in cs.LG and stat.ML

Abstract

We study the problem of identifying the best arm in a stochastic multi-armed bandit game. Given a set of $n$ arms indexed from $1$ to $n$, each arm $i$ is associated with an unknown reward distribution supported on $[0,1]$ with mean $\thetai$ and variance $\sigmai2$. Assume $\theta1 > \theta2 \geq \cdots \geq\thetan$. We propose an adaptive algorithm which explores the gaps and variances of the rewards of the arms and makes future decisions based on the gathered information using a novel approach called \textit{grouped median elimination}. The proposed algorithm guarantees to output the best arm with probability $(1-\delta)$ and uses at most $O \left(\sum{i = 1}n \left(\frac{\sigmai2}{\Deltai2} + \frac{1}{\Deltai}\right)(\ln \delta{-1} + \ln \ln \Deltai{-1})\right)$ samples, where $\Deltai$ ($i \geq 2$) denotes the reward gap between arm $i$ and the best arm and we define $\Delta1 = \Delta2$. This achieves a significant advantage over the variance-independent algorithms in some favorable scenarios and is the first result that removes the extra $\ln n$ factor on the best arm compared with the state-of-the-art. We further show that $\Omega \left( \sum{i = 1}n \left( \frac{\sigmai2}{\Deltai2} + \frac{1}{\Delta_i} \right) \ln \delta{-1} \right)$ samples are necessary for an algorithm to achieve the same goal, thereby illustrating that our algorithm is optimal up to doubly logarithmic terms.

We're not able to analyze this paper right now due to high demand.

Please check back later (sorry!).

Generate a summary of this paper on our Pro plan:

We ran into a problem analyzing this paper.

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.