Emergent Mind

Smooth Bandit Optimization: Generalization to Hölder Space

(2012.06076)
Published Dec 11, 2020 in cs.LG and stat.ML

Abstract

We consider bandit optimization of a smooth reward function, where the goal is cumulative regret minimization. This problem has been studied for $\alpha$-H\"older continuous (including Lipschitz) functions with $0<\alpha\leq 1$. Our main result is in generalization of the reward function to H\"older space with exponent $\alpha>1$ to bridge the gap between Lipschitz bandits and infinitely-differentiable models such as linear bandits. For H\"older continuous functions, approaches based on random sampling in bins of a discretized domain suffices as optimal. In contrast, we propose a class of two-layer algorithms that deploy misspecified linear/polynomial bandit algorithms in bins. We demonstrate that the proposed algorithm can exploit higher-order smoothness of the function by deriving a regret upper bound of $\tilde{O}(T\frac{d+\alpha}{d+2\alpha})$ for when $\alpha>1$, which matches existing lower bound. We also study adaptation to unknown function smoothness over a continuous scale of H\"older spaces indexed by $\alpha$, with a bandit model selection approach applied with our proposed two-layer algorithms. We show that it achieves regret rate that matches the existing lower bound for adaptation within the $\alpha\leq 1$ subset.

We're not able to analyze this paper right now due to high demand.

Please check back later (sorry!).

Generate a summary of this paper on our Pro plan:

We ran into a problem analyzing this paper.

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.