Papers
Topics
Authors
Recent
2000 character limit reached

Bandit Online Learning with Unknown Delays (1807.03205v3)

Published 9 Jul 2018 in cs.LG and stat.ML

Abstract: This paper deals with bandit online learning problems involving feedback of unknown delay that can emerge in multi-armed bandit (MAB) and bandit convex optimization (BCO) settings. MAB and BCO require only values of the objective function involved that become available through feedback, and are used to estimate the gradient appearing in the corresponding iterative algorithms. Since the challenging case of feedback with \emph{unknown} delays prevents one from constructing the sought gradient estimates, existing MAB and BCO algorithms become intractable. For such challenging setups, delayed exploration, exploitation, and exponential (DEXP3) iterations, along with delayed bandit gradient descent (DBGD) iterations are developed for MAB and BCO, respectively. Leveraging a unified analysis framework, it is established that the regret of DEXP3 and DBGD are ${\cal O}\big( \sqrt{K\bar{d}(T+D)} \big)$ and ${\cal O}\big( \sqrt{K(T+D)} \big)$, respectively, where $\bar{d}$ is the maximum delay and $D$ denotes the delay accumulated over $T$ slots. Numerical tests using both synthetic and real data validate the performance of DEXP3 and DBGD.

Citations (39)

Summary

We haven't generated a summary for this paper yet.

Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.