Variance-Dependent Best Arm Identification (2106.10417v3)

Published 19 Jun 2021 in cs.LG and stat.ML

Abstract: We study the problem of identifying the best arm in a stochastic multi-armed bandit game. Given a set of $n$ arms indexed from $1$ to $n$, each arm $i$ is associated with an unknown reward distribution supported on $[0,1]$ with mean $\theta_i$ and variance $\sigma_i^2$. Assume $\theta_1 > \theta_2 \geq \cdots \geq\theta_n$. We propose an adaptive algorithm which explores the gaps and variances of the rewards of the arms and makes future decisions based on the gathered information using a novel approach called \textit{grouped median elimination}. The proposed algorithm guarantees to output the best arm with probability $(1-\delta)$ and uses at most $O \left(\sum_{i = 1}ⁿ \left(\frac{\sigma_i^{2}{\Delta_i^2}} + \frac{1}{\Delta_i}\right)(\ln \delta^{-1} + \ln \ln \Delta_i^{{-1})\right)$} samples, where $\Delta_i$ ($i \geq 2$) denotes the reward gap between arm $i$ and the best arm and we define $\Delta_1 = \Delta_2$. This achieves a significant advantage over the variance-independent algorithms in some favorable scenarios and is the first result that removes the extra $\ln n$ factor on the best arm compared with the state-of-the-art. We further show that $\Omega \left( \sum_{i = 1}ⁿ \left( \frac{\sigma_i^{2}{\Delta_i^2}} + \frac{1}{\Delta_i} \right) \ln \delta^{-1} \right)$ samples are necessary for an algorithm to achieve the same goal, thereby illustrating that our algorithm is optimal up to doubly logarithmic terms.

Citations (8)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Variance-Dependent Best Arm Identification (2106.10417v3)

Summary

Related Papers