Emergent Mind

Kernel-based methods for bandit convex optimization

(1607.03084)
Published Jul 11, 2016 in cs.LG , cs.DS , math.OC , and stat.ML

Abstract

We consider the adversarial convex bandit problem and we build the first $\mathrm{poly}(T)$-time algorithm with $\mathrm{poly}(n) \sqrt{T}$-regret for this problem. To do so we introduce three new ideas in the derivative-free optimization literature: (i) kernel methods, (ii) a generalization of Bernoulli convolutions, and (iii) a new annealing schedule for exponential weights (with increasing learning rate). The basic version of our algorithm achieves $\tilde{O}(n{9.5} \sqrt{T})$-regret, and we show that a simple variant of this algorithm can be run in $\mathrm{poly}(n \log(T))$-time per step at the cost of an additional $\mathrm{poly}(n) T{o(1)}$ factor in the regret. These results improve upon the $\tilde{O}(n{11} \sqrt{T})$-regret and $\exp(\mathrm{poly}(T))$-time result of the first two authors, and the $\log(T){\mathrm{poly}(n)} \sqrt{T}$-regret and $\log(T){\mathrm{poly}(n)}$-time result of Hazan and Li. Furthermore we conjecture that another variant of the algorithm could achieve $\tilde{O}(n{1.5} \sqrt{T})$-regret, and moreover that this regret is unimprovable (the current best lower bound being $\Omega(n \sqrt{T})$ and it is achieved with linear functions). For the simpler situation of zeroth order stochastic convex optimization this corresponds to the conjecture that the optimal query complexity is of order $n3 / \epsilon2$.

We're not able to analyze this paper right now due to high demand.

Please check back later (sorry!).

Generate a summary of this paper on our Pro plan:

We ran into a problem analyzing this paper.

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.