Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 37 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 10 tok/s Pro
GPT-5 High 15 tok/s Pro
GPT-4o 84 tok/s Pro
Kimi K2 198 tok/s Pro
GPT OSS 120B 448 tok/s Pro
Claude Sonnet 4 31 tok/s Pro
2000 character limit reached

Best-of-Both-Worlds Linear Contextual Bandits (2312.16489v1)

Published 27 Dec 2023 in cs.LG, cs.AI, econ.EM, stat.ME, and stat.ML

Abstract: This study investigates the problem of $K$-armed linear contextual bandits, an instance of the multi-armed bandit problem, under an adversarial corruption. At each round, a decision-maker observes an independent and identically distributed context and then selects an arm based on the context and past observations. After selecting an arm, the decision-maker incurs a loss corresponding to the selected arm. The decision-maker aims to minimize the cumulative loss over the trial. The goal of this study is to develop a strategy that is effective in both stochastic and adversarial environments, with theoretical guarantees. We first formulate the problem by introducing a novel setting of bandits with adversarial corruption, referred to as the contextual adversarial regime with a self-bounding constraint. We assume linear models for the relationship between the loss and the context. Then, we propose a strategy that extends the RealLinExp3 by Neu & Olkhovskaya (2020) and the Follow-The-Regularized-Leader (FTRL). The regret of our proposed algorithm is shown to be upper-bounded by $O\left(\min\left{\frac{(\log(T))3}{\Delta_{*}} + \sqrt{\frac{C(\log(T))3}{\Delta_{*}}},\ \ \sqrt{T}(\log(T))2\right}\right)$, where $T \in\mathbb{N}$ is the number of rounds, $\Delta_{} > 0$ is the constant minimum gap between the best and suboptimal arms for any context, and $C\in[0, T] $ is an adversarial corruption parameter. This regret upper bound implies $O\left(\frac{(\log(T))3}{\Delta_{}}\right)$ in a stochastic environment and by $O\left( \sqrt{T}(\log(T))2\right)$ in an adversarial environment. We refer to our strategy as the Best-of-Both-Worlds (BoBW) RealFTRL, due to its theoretical guarantees in both stochastic and adversarial regimes.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (38)
  1. Improved algorithms for linear stochastic bandits. In Advances in Neural Information Processing Systems (NeurIPS), 2011.
  2. Associative reinforcement learning using linear probabilistic concepts. In International Conference on Machine Learning (ICML), 1999.
  3. An algorithm with nearly optimal pseudo-regret for both stochastic and adversarial bandits. In Conference on Learning Theory (COLT), 2016.
  4. Online decision making with high-dimensional covariates. Operations Research, 68(1), 2020.
  5. Contextual bandit algorithms with supervised learning guarantees. In International Conference on Artificial Intelligence and Statistics (AISTATS), 2011.
  6. The best of both worlds: Stochastic and adversarial bandits. In Conference on Learning Theory (COLT), 2012.
  7. Efficient and robust high-dimensional linear contextual bandits. In International Joint Conference on Artificial Intelligence (IJCAI), 2020.
  8. Contextual bandits with linear payoff functions. In International Conference on Artificial Intelligence and Statistics (AISTATS), 2011.
  9. Stochastic linear optimization under bandit feedback. In Annual Conference Computational Learning Theory (COLT), 2008.
  10. Best of both worlds policy optimization. In International Conference on Machine Learning (ICML), 2023.
  11. Robust stochastic linear contextual bandits under adversarial attacks. In International Conference on Artificial Intelligence and Statistics (AISTATS), 2022.
  12. A linear response bandit problem. Stochastic Systems, 2013.
  13. Better algorithms for stochastic bandits with adversarial corruptions. In Conference on Learning Theory (COLT), 2019.
  14. Online learning with low rank experts. In Conference on Learning Theory (COLT), 2016.
  15. Nearly optimal algorithms for linear contextual bandits with adversarial corruptions. In Advances in Neural Information Processing Systems (NeurIPS), 2022.
  16. Nearly optimal best-of-both-worlds algorithms for online learning with feedback graphs. In Advances in Neural Information Processing Systems (NeurIPS), 2022.
  17. Improved best-of-both-worlds guarantees for multi-armed bandits: Ftrl with general regularizers and multiple optimal arms. In Advances in Neural Information Processing Systems (NeurIPS), 2023.
  18. Learning hurdles for sleeping experts. ACM Transactions on Computation Theory, 6(3), 2014.
  19. Best-of-three-worlds analysis for linear bandits with follow-the-regularized-leader algorithm. In Conference on Learning Theory (COLT), 2023.
  20. The End of Optimism? An Asymptotic Analysis of Finite-Armed Linear Bandits. In International Conference on Artificial Intelligence and Statistics (AISTATS), 2017.
  21. Achieving near instance-optimality and minimax-optimality in stochastic and adversarial linear bandits simultaneously. In International Conference on Machine Learning (ICML), 2021.
  22. Regret lower bound and optimal algorithm for high-dimensional contextual linear bandit. Electronic Journal of Statistics, 15(2):5652 – 5695, 2021.
  23. A contextual-bandit approach to personalized news article recommendation. In International Conference on World Wide Web (WWW), 2010.
  24. Bypassing the simulator: Near-optimal adversarial linear contextual bandits. In Advances in Neural Information Processing Systems (NeurIPS), 2023.
  25. Competitive caching with machine learned advice. In International Conference on Machine Learning (ICML), 2018.
  26. Efficient and robust algorithms for adversarial linear contextual bandits. In Conference on Learning Theory (COLT), 2020.
  27. Bistro: An efficient relaxation-based method for contextual bandits. In International Conference on Machine Learning (ICML), 2016.
  28. Linearly parameterized bandits. Mathematics of Operations Research, 35(2):395–411, 2010.
  29. An improved parametrization and analysis of the EXP3++ algorithm for stochastic and adversarial bandits. In Conference on Learning Theory (COLT), 2017.
  30. One practical algorithm for both stochastic and adversarial bandits. In International Conference on Machine Learning (ICML), 2014.
  31. Improved regret bounds for oracle-based adversarial contextual bandits. In Advances in Neural Information Processing Systems (NeurIPS), 2016.
  32. From ads to interventions: Contextual bandits in mobile health. In Mobile Health: Sensors, Analytic Methods, and Applications, pp.  495–517, 2017.
  33. Best-of-both-worlds algorithms for partial monitoring. In International Conference on Algorithmic Learning Theory (ALT), 2023a.
  34. Stability-penalty-adaptive follow-the-regularized-leader: Sparsity, game-dependency, and best-of-both-worlds. In Advances in Neural Information Processing Systems (NeurIPS), 2023b.
  35. Minimax concave penalized multi-armed bandit model with high-dimensional covariates. In International Conference on Machine Learning (ICML), 2018.
  36. More adaptive algorithms for adversarial bandits. In Conference on Learning Theory (COLT), 2018.
  37. Linear contextual bandits with adversarial corruptions, 2021. URL https://openreview.net/forum?id=Wz-t1oOTWa.
  38. Tsallis-inf: An optimal algorithm for stochastic and adversarial bandits. Journal of Machine Learning Research, 22(1), 2021.

Summary

We haven't generated a summary for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Lightbulb On Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.