Emergent Mind

Optimal Regret Algorithm for Pseudo-1d Bandit Convex Optimization

(2102.07387)
Published Feb 15, 2021 in cs.LG

Abstract

We study online learning with bandit feedback (i.e. learner has access to only zeroth-order oracle) where cost/reward functions $\ft$ admit a "pseudo-1d" structure, i.e. $\ft(\w) = \losst(\predt(\w))$ where the output of $\predt$ is one-dimensional. At each round, the learner observes context $\xt$, plays prediction $\predt(\wt; \xt)$ (e.g. $\predt(\cdot)=\langle \xt, \cdot\rangle$) for some $\wt \in \mathbb{R}d$ and observes loss $\losst(\predt(\wt))$ where $\losst$ is a convex Lipschitz-continuous function. The goal is to minimize the standard regret metric. This pseudo-1d bandit convex optimization problem (\SBCO) arises frequently in domains such as online decision-making or parameter-tuning in large systems. For this problem, we first show a lower bound of $\min(\sqrt{dT}, T{3/4})$ for the regret of any algorithm, where $T$ is the number of rounds. We propose a new algorithm \sbcalg that combines randomized online gradient descent with a kernelized exponential weights method to exploit the pseudo-1d structure effectively, guaranteeing the {\em optimal} regret bound mentioned above, up to additional logarithmic factors. In contrast, applying state-of-the-art online convex optimization methods leads to $\tilde{O}\left(\min\left(d{9.5}\sqrt{T},\sqrt{d}T{3/4}\right)\right)$ regret, that is significantly suboptimal in $d$.

We're not able to analyze this paper right now due to high demand.

Please check back later (sorry!).

Generate a summary of this paper on our Pro plan:

We ran into a problem analyzing this paper.

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.