Beyond Theorems: A Counterexample to Potential Markov Game Criteria (2405.08206v1)
Abstract: There are only limited classes of multi-player stochastic games in which independent learning is guaranteed to converge to a Nash equilibrium. Markov potential games are a key example of such classes. Prior work has outlined sets of sufficient conditions for a stochastic game to qualify as a Markov potential game. However, these conditions often impose strict limitations on the game's structure and tend to be challenging to verify. To address these limitations, Mguni et al. [12] introduce a relaxed notion of Markov potential games and offer an alternative set of necessary conditions for categorizing stochastic games as potential games. Under these conditions, the authors claim that a deterministic Nash equilibrium can be computed efficiently by solving a dual Markov decision process. In this paper, we offer evidence refuting this claim by presenting a counterexample.
- Lawrence E Blume. 1995. The statistical mechanics of best-response strategy revision. Games and Economic Behavior 11, 2 (1995), 111–145.
- Vivek S Borkar. 2002. Reinforcement learning in Markovian evolutionary games. Advances in Complex Systems 5, 01 (2002), 55–72.
- The complexity of Markov equilibrium in stochastic games. In The 36th Annual Conference on Learning Theory. 4180–4234.
- Arlington M Fink. 1964. Equilibrium in a stochastic n-person game. Journal of Science of the Hiroshima University, series ai (mathematics) 28, 1 (1964), 89–93.
- Learning with Opponent-Learning Awareness. (2018), 122–130.
- Stabilising experience replay for deep multi-agent reinforcement learning. In Proceedings of the 34th International Conference on Machine Learning (ICML). 1146–1155.
- Independent natural policy gradient always converges in Markov potential games. In Proceedings of the International Conference on Artificial Intelligence and Statistics. 4414–4425.
- Decentralized single-timescale actor-critic on zero-sum two-player stochastic games. In Proceedings of the International Conference on Machine Learning (ICML). 3899–3909.
- Junling Hu and Michael P Wellman. 2003. Nash Q-learning for general-sum stochastic games. Journal of Machine Learning Research 4 (2003), 1039–1069.
- Global convergence of multi-agent policy gradient in Markov potential games. arXiv preprint arXiv:2106.01969 (2021).
- Learning parametric closed-loop policies for Markov potential games. arXiv preprint arXiv:1802.00899 (2018).
- Learning in nonzero-sum stochastic games with potentials. In Proceedings of the International Conference on Machine Learning (ICML). 7688–7699.
- Dov Monderer and Lloyd S Shapley. 1996. Potential games. Games and Economic Behavior 14, 1 (1996), 124–143.
- Learning Nash equilibrium for general-sum Markov games from batch data. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics. 232–241.
- Richard S Sutton and Andrew G Barto. 2018. Reinforcement learning: An introduction. MIT press.
- Christopher JCH Watkins and Peter Dayan. 1992. Q-learning. Machine Learning 8 (1992), 279–292.
- Christopher John Cornish Hellaby Watkins. 1989. Learning from delayed rewards. (1989).
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.