The Real Price of Bandit Information in Multiclass Classification (2405.10027v2)
Abstract: We revisit the classical problem of multiclass classification with bandit feedback (Kakade, Shalev-Shwartz and Tewari, 2008), where each input classifies to one of $K$ possible labels and feedback is restricted to whether the predicted label is correct or not. Our primary inquiry is with regard to the dependency on the number of labels $K$, and whether $T$-step regret bounds in this setting can be improved beyond the $\smash{\sqrt{KT}}$ dependence exhibited by existing algorithms. Our main contribution is in showing that the minimax regret of bandit multiclass is in fact more nuanced, and is of the form $\smash{\widetilde{\Theta}\left(\min \left{|H| + \sqrt{T}, \sqrt{KT \log |H|} \right} \right) }$, where $H$ is the underlying (finite) hypothesis class. In particular, we present a new bandit classification algorithm that guarantees regret $\smash{\widetilde{O}(|H|+\sqrt{T})}$, improving over classical algorithms for moderately-sized hypothesis classes, and give a matching lower bound establishing tightness of the upper bounds (up to log-factors) in all parameter regimes.
- Taming the monster: A fast and simple algorithm for contextual bandits. In International Conference on Machine Learning, pages 1638–1646. PMLR, 2014.
- Uncoupled learning dynamics with o(logt)𝑜𝑡o(\log t)italic_o ( roman_log italic_t ) swap regret in multiplayer games. Advances in Neural Information Processing Systems, 35:3292–3304, 2022.
- Near-optimal ϕitalic-ϕ\phiitalic_ϕ -regret learning in extensive-form games. In International Conference on Machine Learning, pages 814–839. PMLR, 2023.
- Exploration–exploitation tradeoff using variance estimates in multi-armed bandits. Theoretical Computer Science, 410(19):1876–1902, 2009.
- P. Auer and P. M. Long. Structural results about on-line learning models with and without queries. Mach. Learn., 36(3):147–181, 1999. doi: 10.1023/A:1007614417594.
- The nonstochastic multiarmed bandit problem. SIAM journal on computing, 32(1):48–77, 2002.
- Contextual bandit algorithms with supervised learning guarantees. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pages 19–26. JMLR Workshop and Conference Proceedings, 2011.
- Learnability and the vapnik-chervonenkis dimension. Journal of the ACM (JACM), 36(4):929–965, 1989.
- Sparsity, variance and curvature in multi-armed bandits. In Algorithmic Learning Theory, pages 111–127. PMLR, 2018.
- Contextual bandits with linear payoff functions. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pages 208–214. JMLR Workshop and Conference Proceedings, 2011.
- Network information theory. Elements of information theory, pages 374–458, 1991.
- A. Daniely and T. Helbertal. The price of bandit information in multiclass online classification. In Conference on Learning Theory, pages 93–104. PMLR, 2013.
- Multiclass learnability and the erm principle. In Proceedings of the 24th Annual Conference on Learning Theory, pages 207–232. JMLR Workshop and Conference Proceedings, 2011.
- Efficient optimal learning for contextual bandits. arXiv preprint arXiv:1106.2369, 2011.
- L. Erez and T. Koren. Best-of-all-worlds bounds for online learning with feedback graphs. In NeurIPS, pages 28511–28521, 2021.
- Parametric bandits: The generalized linear case. Advances in neural information processing systems, 23, 2010.
- D. Foster and A. Rakhlin. Beyond ucb: Optimal and efficient contextual bandits with regression oracles. In International Conference on Machine Learning, pages 3199–3210. PMLR, 2020.
- Practical contextual bandits with regression oracles. In International Conference on Machine Learning, pages 1539–1548. PMLR, 2018.
- Adapting to misspecification in contextual bandits. Advances in Neural Information Processing Systems, 33:11478–11489, 2020a.
- Instance-dependent complexity of contextual bandits and reinforcement learning: A disagreement-based perspective. arXiv preprint arXiv:2010.03104, 2020b.
- J. Geneson. A note on the price of bandit feedback for mistake-bounded online learning. Theoretical Computer Science, 874:42–45, 2021.
- S. Hanneke and L. Yang. Minimax analysis of active learning. J. Mach. Learn. Res., 16(1):3487–3602, 2015.
- E. Hazan et al. Introduction to online convex optimization. Foundations and Trends® in Optimization, 2(3-4):157–325, 2016.
- T. Jin and H. Luo. Simultaneously learning stochastic and adversarial episodic mdps with known transition. Advances in Neural Information Processing Systems, 33, 2020.
- The best of both worlds: stochastic and adversarial episodic mdps with unknown transition. Advances in Neural Information Processing Systems, 34:20491–20502, 2021.
- Efficient bandit algorithms for online multiclass prediction. In Proceedings of the 25th international conference on Machine learning, pages 440–447, 2008.
- J. Kwon and V. Perchet. Gains and losses are fundamentally different in regret minimization: The sparse case. The Journal of Machine Learning Research, 17(1):8106–8137, 2016.
- J. Langford and T. Zhang. The epoch-greedy algorithm for contextual multi-armed bandits. Advances in neural information processing systems, 20(1):96–1, 2007.
- T. Lattimore and C. Szepesvari. Bandit Algorithms. Cambridge University Press, 2020.
- Provably optimal algorithms for generalized linear contextual bandits. In International Conference on Machine Learning, pages 2071–2080. PMLR, 2017.
- P. M. Long. New bounds on the price of bandit feedback for mistake-bounded online multiclass learning. Theor. Comput. Sci., 808:159–163, 2020. doi: 10.1016/J.TCS.2019.11.017.
- H. B. McMahan and M. Streeter. Tighter bounds for multi-armed bandits with expert advice. In COLT, 2009.
- F. Orabona. A modern introduction to online learning. arXiv preprint arXiv:1912.13213, 2019.
- Multiclass online learnability under bandit feedback. arXiv preprint arXiv:2308.04620, 2023.
- A. Slivkins. Introduction to multi-armed bandits. SIGecom Exch., 18(1):28–30, 2020.
- C.-Y. Wei and H. Luo. More adaptive algorithms for adversarial bandits. In Conference On Learning Theory, pages 1263–1291. PMLR, 2018.
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.