LC-Tsallis-INF: Generalized Best-of-Both-Worlds Linear Contextual Bandits (2403.03219v3)

Published 5 Mar 2024 in cs.LG and stat.ML

Abstract: We investigate the \emph{linear contextual bandit problem} with independent and identically distributed (i.i.d.) contexts. In this problem, we aim to develop a \emph{Best-of-Both-Worlds} (BoBW) algorithm with regret upper bounds in both stochastic and adversarial regimes. We develop an algorithm based on \emph{Follow-The-Regularized-Leader} (FTRL) with Tsallis entropy, referred to as the $\alpha$-\emph{Linear-Contextual (LC)-Tsallis-INF}. We show that its regret is at most $O(\log(T))$ in the stochastic regime under the assumption that the suboptimality gap is uniformly bounded from below, and at most $O(\sqrt{T})$ in the adversarial regime. Furthermore, our regret analysis is extended to more general regimes characterized by the \emph{margin condition} with a parameter $\beta \in (1, \infty]$, which imposes a milder assumption on the suboptimality gap. We show that the proposed algorithm achieves $O\left(\log(T)^{{\frac{1+\beta}{2+\beta}}T^{{\frac{1}{2+\beta}}\right)$}} regret under the margin condition.

References (44)

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Tweets

https://twitter.com/realmofresearch/status/1778640612163813722

LC-Tsallis-INF: Generalized Best-of-Both-Worlds Linear Contextual Bandits (2403.03219v3)

Summary

Related Papers

Tweets