Emergent Mind

Minimizing Regret of Bandit Online Optimization in Unconstrained Action Spaces

(1806.05069)
Published Jun 13, 2018 in math.OC and cs.LG

Abstract

We consider online convex optimization with a zero-order oracle feedback. In particular, the decision maker does not know the explicit representation of the time-varying cost functions, or their gradients. At each time step, she observes the value of the corresponding cost function evaluated at her chosen action (zero-order oracle). The objective is to minimize the regret, that is, the difference between the sum of the costs she accumulates and that of a static optimal action had she known the sequence of cost functions a priori. We present a novel algorithm to minimize regret in unconstrained action spaces. Our algorithm hinges on a classical idea of one-point estimation of the gradients of the cost functions based on their observed values. The algorithm is independent of problem parameters. Letting $T$ denote the number of queries of the zero-order oracle and $n$ the problem dimension, the regret rate achieved is $O(n{2/3}T{2/3})$. Moreover, we adapt the presented algorithm to the setting with two-point feedback and demonstrate that the adapted procedure achieves the theoretical lower bound on the regret of $(n{1/2}T{1/2})$.

We're not able to analyze this paper right now due to high demand.

Please check back later (sorry!).

Generate a summary of this paper on our Pro plan:

We ran into a problem analyzing this paper.

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.