Independent Learning in Constrained Markov Potential Games (2402.17885v1)
Abstract: Constrained Markov games offer a formal mathematical framework for modeling multi-agent reinforcement learning problems where the behavior of the agents is subject to constraints. In this work, we focus on the recently introduced class of constrained Markov Potential Games. While centralized algorithms have been proposed for solving such constrained games, the design of converging independent learning algorithms tailored for the constrained setting remains an open question. We propose an independent policy gradient algorithm for learning approximate constrained Nash equilibria: Each agent observes their own actions and rewards, along with a shared state. Inspired by the optimization literature, our algorithm performs proximal-point-like updates augmented with a regularized constraint set. Each proximal step is solved inexactly using a stochastic switching gradient algorithm. Notably, our algorithm can be implemented independently without a centralized coordination mechanism requiring turn-based agent updates. Under some technical constraint qualification conditions, we establish convergence guarantees towards constrained approximate Nash equilibria. We perform simulations to illustrate our results.
- On the theory of policy gradient methods: Optimality, approximation, and distribution shift. Journal of Machine Learning Research, 22(98):1–76.
- Provably learning nash policies in constrained markov potential games. In Sixteenth European Workshop on Reinforcement Learning.
- Altman, E. (1999). Constrained Markov decision processes, volume 7. CRC press.
- Constrained Markov Games: Nash Equilibria. In Filar, J. A., Gaitsgory, V., and Mizukami, K., editors, Advances in Dynamic Games and Applications, Annals of the International Society of Dynamic Games, pages 213–221, Boston, MA. Birkhäuser.
- Level constrained first order methods for function constrained optimization. arXiv preprint arXiv:2205.08011.
- Stochastic first-order methods for convex and nonconvex functional constrained optimization. Mathematical Programming, 197(1):215–279.
- Finding correlated equilibrium of constrained markov game: A primal-dual approach. In Advances in Neural Information Processing Systems.
- A finite-sample analysis of payoff-based independent learning in zero-sum stochastic games. In Thirty-seventh Conference on Neural Information Processing Systems.
- Cooperative ai: machines must learn to find common ground. Nature, 593(7857):33–36.
- Open problems in cooperative ai. arXiv preprint arXiv:2012.08630.
- Independent policy gradient methods for competitive reinforcement learning. Advances in neural information processing systems, 33:5527–5540.
- Proximally guided stochastic subgradient method for nonsmooth, nonconvex problems. SIAM Journal on Optimization, 29(3):1908–1930.
- Actor-Critic Algorithms for Constrained Multi-agent Reinforcement Learning. arXiv:1905.02907 [cs].
- Independent Policy Gradient for Large-Scale Markov Potential Games: Sharper Rates, Function Approximation, and Game-Agnostic Convergence. In Proceedings of the 39th International Conference on Machine Learning, pages 5166–5220. PMLR. ISSN: 2640-3498.
- Provably efficient generalized lagrangian policy optimization for safe multi-agent reinforcement learning. In Learning for Dynamics and Control Conference, pages 315–332. PMLR.
- Safe multi-agent reinforcement learning via shielding. In Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems, AAMAS ’21, page 483–491, Richland, SC. International Foundation for Autonomous Agents and Multiagent Systems.
- Independent natural policy gradient always converges in markov potential games. In International Conference on Artificial Intelligence and Statistics, pages 4414–4425. PMLR.
- On the convergence of policy gradient methods to nash equilibria in general stochastic games. Advances in Neural Information Processing Systems, 35:7128–7141.
- Safe multi-agent reinforcement learning for multi-robot control. Artificial Intelligence, 319:103905.
- Markov alpha𝑎𝑙𝑝ℎ𝑎alphaitalic_a italic_l italic_p italic_h italic_a-potential games: Equilibrium approximation and regret analysis. arXiv preprint arXiv:2305.12553.
- Computing proximal points of nonconvex functions. Mathematical Programming, 116(1-2):221–258.
- First-Order Methods for Nonsmooth Nonconvex Functional Constrained Optimization with or without Slater Points. arXiv:2212.00927 [math].
- Algorithms for stochastic optimization with function or expectation constraints. Computational Optimization and Applications, 76(2):461–498.
- Global convergence of multi-agent policy gradient in markov potential games. In International Conference on Learning Representations.
- Potential game-based decision-making for autonomous driving. IEEE Transactions on Intelligent Transportation Systems.
- Quadratically regularized subgradient methods for weakly convex optimization with weakly convex constraints. pages 6554–6564. PMLR.
- Learning parametric closed-loop policies for markov potential games. In International Conference on Learning Representations.
- Independent and Decentralized Learning in Markov Potential Games. arXiv:2205.14590 [cs, eess].
- On improving model-free algorithms for decentralized multi-agent reinforcement learning. In International Conference on Machine Learning, pages 15007–15049. PMLR.
- Marden, J. R. (2012). State based potential games. Automatica, 48(12):3075–3088.
- Potential games. Games and economic behavior, 14(1):124–143.
- Multi-agent learning via markov potential games in marketplaces for distributed energy resources. In 2022 IEEE 61st Conference on Decision and Control (CDC), pages 6350–6357. IEEE.
- Independent learning in stochastic games. Invited chapter for the International Congress of Mathematicians 2022 (ICM 2022), arXiv preprint arXiv:2111.11743.
- Constrained reinforcement learning has zero duality gap. Advances in Neural Information Processing Systems, 32.
- Polyak, B. T. (1967). A general method for solving extremal problems. In Doklady Akademii Nauk, volume 174, pages 33–36. Russian Academy of Sciences.
- Decentralized q-learning in zero-sum markov games. Advances in Neural Information Processing Systems, 34:18320–18334.
- Safe, multi-agent, reinforcement learning for autonomous driving. arXiv preprint arXiv:1610.03295.
- When can we learn general-sum markov games with a large number of players sample-efficiently? In International Conference on Learning Representations.
- Vershynin, R. (2018). High-dimensional probability: An introduction with applications in data science, volume 47. Cambridge university press.
- Xiao, L. (2022). On the convergence rates of policy gradient methods. Journal of Machine Learning Research, 23(282):1–36.
- Learning distributed and fair policies for network load balancing as markov potential game. Advances in Neural Information Processing Systems, 35:28815–28828.
- On the global convergence rates of decentralized softmax gradient play in markov potential games. Advances in Neural Information Processing Systems, 35:1923–1935.
- Gradient play in stochastic games: Stationary points and local geometry. IFAC-PapersOnLine, 55(30):73–78. 25th International Symposium on Mathematical Theory of Networks and Systems MTNS 2022.
- Convergence rates for localized actor-critic in networked Markov potential games. In Evans, R. J. and Shpitser, I., editors, Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence, volume 216 of Proceedings of Machine Learning Research, pages 2563–2573. PMLR.