Optimistic Safety for Online Convex Optimization with Unknown Linear Constraints (2403.05786v3)
Abstract: We study the problem of online convex optimization (OCO) under unknown linear constraints that are either static, or stochastically time-varying. For this problem, we introduce an algorithm that we term Optimistically Safe OCO (OSOCO) and show that it enjoys $\tilde{O}(\sqrt{T})$ regret and no constraint violation. In the case of static linear constraints, this improves on the previous best known $\tilde{O}(T{2/3})$ regret under the same assumptions. In the case of stochastic time-varying constraints, our work supplements existing results that show $O(\sqrt{T})$ regret and $O(\sqrt{T})$ cumulative violation under more general convex constraints and a different set of assumptions. In addition to our theoretical guarantees, we also give numerical results that further validate the effectiveness of our approach.
- Improved algorithms for linear stochastic bandits. Advances in neural information processing systems, 24, 2011.
- Constrained policy optimization. In International conference on machine learning, pp. 22–31. PMLR, 2017.
- Learning in non-convex games with an optimization oracle. In Conference on Learning Theory, pp. 18–29. PMLR, 2019.
- Linear stochastic bandits under safety constraints. Advances in Neural Information Processing Systems, 32, 2019.
- Safe reinforcement learning with linear function approximation. In International Conference on Machine Learning, pp. 243–253. PMLR, 2021.
- The multiplicative weights update method: a meta-algorithm and applications. Theory of computing, 8(1):121–164, 2012.
- Online linear optimization and adaptive routing. Journal of Computer and System Sciences, 74(1):97–114, 2008.
- Online convex optimization with time-varying constraints and bandit feedback. IEEE Transactions on automatic control, 64(7):2665–2680, 2018.
- A virtual-queue-based algorithm for constrained online convex optimization with applications to data center resource allocation. IEEE Journal of Selected Topics in Signal Processing, 12(4):703–716, 2018.
- Rate-optimal online convex optimization in adaptive linear control. Advances in Neural Information Processing Systems, 35:7410–7422, 2022.
- A unifying framework for online optimization with long-term constraints. Advances in Neural Information Processing Systems, 35:33589–33602, 2022.
- Prediction, learning, and games. Cambridge university press, 2006.
- Dynamic regret analysis of safe distributed online optimization for convex and non-convex problems. Transactions on Machine Learning Research, 2023.
- Safe online convex optimization with unknown linear safety constraints. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pp. 6175–6182, 2022.
- Prediction with expert advice under discounted loss. In International Conference on Algorithmic Learning Theory, pp. 255–269. Springer, 2010.
- Online linear quadratic control. In International Conference on Machine Learning, pp. 1029–1038. PMLR, 2018.
- Learning linear-quadratic regulators efficiently with only T𝑇\sqrt{T}square-root start_ARG italic_T end_ARG regret. In International Conference on Machine Learning, pp. 1300–1309. PMLR, 2019.
- Cover, T. M. Universal portfolios. Mathematical finance, 1(1):1–29, 1991.
- The price of bandit information for online optimization. Advances in Neural Information Processing Systems, 20, 2007.
- Stochastic linear optimization under bandit feedback. 2008.
- Safe learning under uncertain objectives and constraints. arXiv preprint arXiv:2006.13326, 2020.
- A decision-theoretic generalization of on-line learning and an application to boosting. Journal of computer and system sciences, 55(1):119–139, 1997.
- A linearly convergent variant of the conditional gradient algorithm under strong convexity, with applications to online and stochastic optimization. SIAM Journal on Optimization, 26(3):1493–1528, 2016.
- Online convex optimization with hard constraints: Towards the best of two worlds and beyond. Advances in Neural Information Processing Systems, 35:36426–36439, 2022.
- Projection-free online learning. In Proceedings of the 29th International Coference on International Conference on Machine Learning, pp. 1843–1850, 2012.
- Faster projection-free online learning. In Conference on Learning Theory, pp. 1877–1893. PMLR, 2020.
- Hazan, E. et al. Introduction to online convex optimization. Foundations and Trends® in Optimization, 2(3-4):157–325, 2016.
- Efficient algorithms for online decision problems. Journal of Computer and System Sciences, 71(3):291–307, 2005.
- Revisiting projection-free online learning: the strongly convex case. In International Conference on Artificial Intelligence and Statistics, pp. 3592–3600. PMLR, 2021.
- The hedge algorithm on a continuum. In International Conference on Machine Learning, pp. 824–832. PMLR, 2015.
- Projection free online learning over smooth sets. In The 22nd international conference on artificial intelligence and statistics, pp. 1458–1466. PMLR, 2019.
- Cautious regret minimization: Online optimization with long-term budget constraints. In International Conference on Machine Learning, pp. 3944–3952. PMLR, 2019.
- Learning policies with zero or bounded constraint violation for constrained mdps. Advances in Neural Information Processing Systems, 34:17183–17193, 2021.
- Trading regret for efficiency: online convex optimization with long term constraints. The Journal of Machine Learning Research, 13(1):2503–2528, 2012.
- Online learning in adversarial lipschitz environments. In Joint european conference on machine learning and knowledge discovery in databases, pp. 305–320. Springer, 2010.
- Online learning with sample path constraints. Journal of Machine Learning Research, 10(3), 2009.
- Ad click prediction: a view from the trenches. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 1222–1230, 2013.
- Mhammedi, Z. Efficient projection-free online convex optimization with membership oracle. In Conference on Learning Theory, pp. 5314–5390. PMLR, 2022.
- Safe linear thompson sampling with side information. IEEE Transactions on Signal Processing, 69:3755–3767, 2021.
- Online convex optimization with time-varying constraints. arXiv preprint arXiv:1702.04783, 2017.
- Stochastic bandits with linear constraints. In International conference on artificial intelligence and statistics, pp. 2827–2835. PMLR, 2021.
- Understanding machine learning: From theory to algorithms. Cambridge university press, 2014.
- Improper learning for non-stochastic control. In Conference on Learning Theory, pp. 3320–3436. PMLR, 2020.
- Online non-convex learning: Following the perturbed leader is optimal. In Algorithmic Learning Theory, pp. 845–861. PMLR, 2020.
- Safe exploration for optimization with gaussian processes. In International conference on machine learning, pp. 997–1005. PMLR, 2015.
- Stagewise safe bayesian optimization with gaussian processes. In International conference on machine learning, pp. 4781–4789. PMLR, 2018.
- Safety-aware algorithms for adversarial contextual bandit. In International Conference on Machine Learning, pp. 3280–3288. PMLR, 2017.
- Safe convex learning under uncertain constraints. In The 22nd International Conference on Artificial Intelligence and Statistics, pp. 2106–2114. PMLR, 2019.
- Safe reinforcement learning in constrained markov decision processes. In International Conference on Machine Learning, pp. 9797–9806. PMLR, 2020.
- Online primal-dual mirror descent under stochastic constraints. Proceedings of the ACM on Measurement and Analysis of Computing Systems, 4(2):1–36, 2020.
- Regret and cumulative constraint violation analysis for online convex optimization with long term constraints. In International Conference on Machine Learning, pp. 11998–12008. PMLR, 2021.
- Regret and cumulative constraint violation analysis for distributed online constrained convex optimization. IEEE Transactions on Automatic Control, 2022.
- Learning-aided optimization for energy-harvesting devices with outdated state information. IEEE/ACM Transactions on Networking, 27(4):1501–1514, 2019.
- A low complexity algorithm with O(T)𝑂𝑇{O}(\sqrt{T})italic_O ( square-root start_ARG italic_T end_ARG ) regret and O(1)𝑂1{O}(1)italic_O ( 1 ) constraint violations for online convex optimization with long term constraints. Journal of Machine Learning Research, 21(1):1–24, 2020.
- Online convex optimization with stochastic constraints. Advances in Neural Information Processing Systems, 30, 2017.
- Zinkevich, M. Online convex programming and generalized infinitesimal gradient ascent. In Proceedings of the 20th international conference on machine learning (icml-03), pp. 928–936, 2003.