Emergent Mind

Delaytron: Efficient Learning of Multiclass Classifiers with Delayed Bandit Feedbacks

(2205.08234)
Published May 17, 2022 in cs.LG , cs.AI , and stat.ML

Abstract

In this paper, we present online algorithm called {\it Delaytron} for learning multi class classifiers using delayed bandit feedbacks. The sequence of feedback delays ${dt}{t=1}T$ is unknown to the algorithm. At the $t$-th round, the algorithm observes an example $\mathbf{x}t$ and predicts a label $\tilde{y}t$ and receives the bandit feedback $\mathbb{I}[\tilde{y}t=yt]$ only $dt$ rounds later. When $t+dt>T$, we consider that the feedback for the $t$-th round is missing. We show that the proposed algorithm achieves regret of $\mathcal{O}\left(\sqrt{\frac{2 K}{\gamma}\left[\frac{T}{2}+\left(2+\frac{L2}{R2\Vert \W\VertF2}\right)\sum{t=1}Td_t\right]}\right)$ when the loss for each missing sample is upper bounded by $L$. In the case when the loss for missing samples is not upper bounded, the regret achieved by Delaytron is $\mathcal{O}\left(\sqrt{\frac{2 K}{\gamma}\left[\frac{T}{2}+2\sum{t=1}Tdt+\vert \mathcal{M}\vert T\right]}\right)$ where $\mathcal{M}$ is the set of missing samples in $T$ rounds. These bounds were achieved with a constant step size which requires the knowledge of $T$ and $\sum{t=1}Tdt$. For the case when $T$ and $\sum{t=1}Tdt$ are unknown, we use a doubling trick for online learning and proposed Adaptive Delaytron. We show that Adaptive Delaytron achieves a regret bound of $\mathcal{O}\left(\sqrt{T+\sum{t=1}Tdt}\right)$. We show the effectiveness of our approach by experimenting on various datasets and comparing with state-of-the-art approaches.

We're not able to analyze this paper right now due to high demand.

Please check back later (sorry!).

Generate a summary of this paper on our Pro plan:

We ran into a problem analyzing this paper.

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.