Emergent Mind

LC-Tsallis-INF: Generalized Best-of-Both-Worlds Linear Contextual Bandits

(2403.03219)
Published Mar 5, 2024 in cs.LG and stat.ML

Abstract

This study considers the linear contextual bandit problem with independent and identically distributed (i.i.d.) contexts. In this problem, existing studies have proposed Best-of-Both-Worlds (BoBW) algorithms whose regrets satisfy $O(\log2(T))$ for the number of rounds $T$ in a stochastic regime with a suboptimality gap lower-bounded by a positive constant, while satisfying $O(\sqrt{T})$ in an adversarial regime. However, the dependency on $T$ has room for improvement, and the suboptimality-gap assumption can be relaxed. For this issue, this study proposes an algorithm whose regret satisfies $O(\log(T))$ in the setting when the suboptimality gap is lower-bounded. Furthermore, we introduce a margin condition, a milder assumption on the suboptimality gap. That condition characterizes the problem difficulty linked to the suboptimality gap using a parameter $\beta \in (0, \infty]$. We then show that the algorithm's regret satisfies $O\left(\left{\log(T)\right}{\frac{1+\beta}{2+\beta}}T{\frac{1}{2+\beta}}\right)$. Here, $\beta= \infty$ corresponds to the case in the existing studies where a lower bound exists in the suboptimality gap, and our regret satisfies $O(\log(T))$ in that case. Our proposed algorithm is based on the Follow-The-Regularized-Leader with the Tsallis entropy and referred to as the $\alpha$-Linear-Contextual (LC)-Tsallis-INF.

We're not able to analyze this paper right now due to high demand.

Please check back later (sorry!).

Generate a summary of this paper on our Pro plan:

We ran into a problem analyzing this paper.

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.