Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
158 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Optimistic Safety for Online Convex Optimization with Unknown Linear Constraints (2403.05786v3)

Published 9 Mar 2024 in cs.LG and math.OC

Abstract: We study the problem of online convex optimization (OCO) under unknown linear constraints that are either static, or stochastically time-varying. For this problem, we introduce an algorithm that we term Optimistically Safe OCO (OSOCO) and show that it enjoys $\tilde{O}(\sqrt{T})$ regret and no constraint violation. In the case of static linear constraints, this improves on the previous best known $\tilde{O}(T{2/3})$ regret under the same assumptions. In the case of stochastic time-varying constraints, our work supplements existing results that show $O(\sqrt{T})$ regret and $O(\sqrt{T})$ cumulative violation under more general convex constraints and a different set of assumptions. In addition to our theoretical guarantees, we also give numerical results that further validate the effectiveness of our approach.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (56)
  1. Improved algorithms for linear stochastic bandits. Advances in neural information processing systems, 24, 2011.
  2. Constrained policy optimization. In International conference on machine learning, pp.  22–31. PMLR, 2017.
  3. Learning in non-convex games with an optimization oracle. In Conference on Learning Theory, pp.  18–29. PMLR, 2019.
  4. Linear stochastic bandits under safety constraints. Advances in Neural Information Processing Systems, 32, 2019.
  5. Safe reinforcement learning with linear function approximation. In International Conference on Machine Learning, pp. 243–253. PMLR, 2021.
  6. The multiplicative weights update method: a meta-algorithm and applications. Theory of computing, 8(1):121–164, 2012.
  7. Online linear optimization and adaptive routing. Journal of Computer and System Sciences, 74(1):97–114, 2008.
  8. Online convex optimization with time-varying constraints and bandit feedback. IEEE Transactions on automatic control, 64(7):2665–2680, 2018.
  9. A virtual-queue-based algorithm for constrained online convex optimization with applications to data center resource allocation. IEEE Journal of Selected Topics in Signal Processing, 12(4):703–716, 2018.
  10. Rate-optimal online convex optimization in adaptive linear control. Advances in Neural Information Processing Systems, 35:7410–7422, 2022.
  11. A unifying framework for online optimization with long-term constraints. Advances in Neural Information Processing Systems, 35:33589–33602, 2022.
  12. Prediction, learning, and games. Cambridge university press, 2006.
  13. Dynamic regret analysis of safe distributed online optimization for convex and non-convex problems. Transactions on Machine Learning Research, 2023.
  14. Safe online convex optimization with unknown linear safety constraints. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pp.  6175–6182, 2022.
  15. Prediction with expert advice under discounted loss. In International Conference on Algorithmic Learning Theory, pp.  255–269. Springer, 2010.
  16. Online linear quadratic control. In International Conference on Machine Learning, pp. 1029–1038. PMLR, 2018.
  17. Learning linear-quadratic regulators efficiently with only T𝑇\sqrt{T}square-root start_ARG italic_T end_ARG regret. In International Conference on Machine Learning, pp. 1300–1309. PMLR, 2019.
  18. Cover, T. M. Universal portfolios. Mathematical finance, 1(1):1–29, 1991.
  19. The price of bandit information for online optimization. Advances in Neural Information Processing Systems, 20, 2007.
  20. Stochastic linear optimization under bandit feedback. 2008.
  21. Safe learning under uncertain objectives and constraints. arXiv preprint arXiv:2006.13326, 2020.
  22. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of computer and system sciences, 55(1):119–139, 1997.
  23. A linearly convergent variant of the conditional gradient algorithm under strong convexity, with applications to online and stochastic optimization. SIAM Journal on Optimization, 26(3):1493–1528, 2016.
  24. Online convex optimization with hard constraints: Towards the best of two worlds and beyond. Advances in Neural Information Processing Systems, 35:36426–36439, 2022.
  25. Projection-free online learning. In Proceedings of the 29th International Coference on International Conference on Machine Learning, pp.  1843–1850, 2012.
  26. Faster projection-free online learning. In Conference on Learning Theory, pp.  1877–1893. PMLR, 2020.
  27. Hazan, E. et al. Introduction to online convex optimization. Foundations and Trends® in Optimization, 2(3-4):157–325, 2016.
  28. Efficient algorithms for online decision problems. Journal of Computer and System Sciences, 71(3):291–307, 2005.
  29. Revisiting projection-free online learning: the strongly convex case. In International Conference on Artificial Intelligence and Statistics, pp.  3592–3600. PMLR, 2021.
  30. The hedge algorithm on a continuum. In International Conference on Machine Learning, pp. 824–832. PMLR, 2015.
  31. Projection free online learning over smooth sets. In The 22nd international conference on artificial intelligence and statistics, pp.  1458–1466. PMLR, 2019.
  32. Cautious regret minimization: Online optimization with long-term budget constraints. In International Conference on Machine Learning, pp. 3944–3952. PMLR, 2019.
  33. Learning policies with zero or bounded constraint violation for constrained mdps. Advances in Neural Information Processing Systems, 34:17183–17193, 2021.
  34. Trading regret for efficiency: online convex optimization with long term constraints. The Journal of Machine Learning Research, 13(1):2503–2528, 2012.
  35. Online learning in adversarial lipschitz environments. In Joint european conference on machine learning and knowledge discovery in databases, pp.  305–320. Springer, 2010.
  36. Online learning with sample path constraints. Journal of Machine Learning Research, 10(3), 2009.
  37. Ad click prediction: a view from the trenches. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, pp.  1222–1230, 2013.
  38. Mhammedi, Z. Efficient projection-free online convex optimization with membership oracle. In Conference on Learning Theory, pp.  5314–5390. PMLR, 2022.
  39. Safe linear thompson sampling with side information. IEEE Transactions on Signal Processing, 69:3755–3767, 2021.
  40. Online convex optimization with time-varying constraints. arXiv preprint arXiv:1702.04783, 2017.
  41. Stochastic bandits with linear constraints. In International conference on artificial intelligence and statistics, pp.  2827–2835. PMLR, 2021.
  42. Understanding machine learning: From theory to algorithms. Cambridge university press, 2014.
  43. Improper learning for non-stochastic control. In Conference on Learning Theory, pp.  3320–3436. PMLR, 2020.
  44. Online non-convex learning: Following the perturbed leader is optimal. In Algorithmic Learning Theory, pp.  845–861. PMLR, 2020.
  45. Safe exploration for optimization with gaussian processes. In International conference on machine learning, pp. 997–1005. PMLR, 2015.
  46. Stagewise safe bayesian optimization with gaussian processes. In International conference on machine learning, pp. 4781–4789. PMLR, 2018.
  47. Safety-aware algorithms for adversarial contextual bandit. In International Conference on Machine Learning, pp. 3280–3288. PMLR, 2017.
  48. Safe convex learning under uncertain constraints. In The 22nd International Conference on Artificial Intelligence and Statistics, pp.  2106–2114. PMLR, 2019.
  49. Safe reinforcement learning in constrained markov decision processes. In International Conference on Machine Learning, pp. 9797–9806. PMLR, 2020.
  50. Online primal-dual mirror descent under stochastic constraints. Proceedings of the ACM on Measurement and Analysis of Computing Systems, 4(2):1–36, 2020.
  51. Regret and cumulative constraint violation analysis for online convex optimization with long term constraints. In International Conference on Machine Learning, pp. 11998–12008. PMLR, 2021.
  52. Regret and cumulative constraint violation analysis for distributed online constrained convex optimization. IEEE Transactions on Automatic Control, 2022.
  53. Learning-aided optimization for energy-harvesting devices with outdated state information. IEEE/ACM Transactions on Networking, 27(4):1501–1514, 2019.
  54. A low complexity algorithm with O⁢(T)𝑂𝑇{O}(\sqrt{T})italic_O ( square-root start_ARG italic_T end_ARG ) regret and O⁢(1)𝑂1{O}(1)italic_O ( 1 ) constraint violations for online convex optimization with long term constraints. Journal of Machine Learning Research, 21(1):1–24, 2020.
  55. Online convex optimization with stochastic constraints. Advances in Neural Information Processing Systems, 30, 2017.
  56. Zinkevich, M. Online convex programming and generalized infinitesimal gradient ascent. In Proceedings of the 20th international conference on machine learning (icml-03), pp.  928–936, 2003.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets