Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 164 tok/s
Gemini 2.5 Pro 48 tok/s Pro
GPT-5 Medium 32 tok/s Pro
GPT-5 High 37 tok/s Pro
GPT-4o 76 tok/s Pro
Kimi K2 216 tok/s Pro
GPT OSS 120B 435 tok/s Pro
Claude Sonnet 4.5 35 tok/s Pro
2000 character limit reached

POWQMIX: Weighted Value Factorization with Potentially Optimal Joint Actions Recognition for Cooperative Multi-Agent Reinforcement Learning (2405.08036v5)

Published 13 May 2024 in cs.LG and cs.AI

Abstract: Value function factorization methods are commonly used in cooperative multi-agent reinforcement learning, with QMIX receiving significant attention. Many QMIX-based methods introduce monotonicity constraints between the joint action value and individual action values to achieve decentralized execution. However, such constraints limit the representation capacity of value factorization, restricting the joint action values it can represent and hindering the learning of the optimal policy. To address this challenge, we propose the Potentially Optimal Joint Actions Weighted QMIX (POWQMIX) algorithm, which recognizes the potentially optimal joint actions and assigns higher weights to the corresponding losses of these joint actions during training. We theoretically prove that with such a weighted training approach the optimal policy is guaranteed to be recovered. Experiments in matrix games, difficulty-enhanced predator-prey, and StarCraft II Multi-Agent Challenge environments demonstrate that our algorithm outperforms the state-of-the-art value-based multi-agent reinforcement learning methods.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (21)
  1. Counterfactual multi-agent policy gradients. In Proceedings of the AAAI conference on artificial intelligence, volume 32, 2018.
  2. Rethinking the implementation tricks and monotonicity constraint in cooperative multi-agent reinforcement learning. arXiv preprint arXiv:2102.03479, 2021.
  3. A multi-agent reinforcement learning method for swarm robots in space collaborative exploration. In 2020 6th international conference on control, automation and robotics (ICCAR), pages 139–144. IEEE, 2020.
  4. Multi-agent actor-critic for mixed cooperative-competitive environments. Advances in neural information processing systems, 30, 2017.
  5. Maven: Multi-agent variational exploration. Advances in neural information processing systems, 32, 2019.
  6. Remix: Regret minimization for monotonic value function factorization in multiagent reinforcement learning. arXiv preprint arXiv:2302.05593, 2023.
  7. A concise introduction to decentralized POMDPs, volume 1. Springer, 2016.
  8. Weighted qmix: Expanding monotonic value function factorisation for deep multi-agent reinforcement learning. Advances in neural information processing systems, 33:10199–10210, 2020a.
  9. Monotonic value function factorisation for deep multi-agent reinforcement learning. The Journal of Machine Learning Research, 21(1):7234–7284, 2020b.
  10. The starcraft multi-agent challenge. arXiv preprint arXiv:1902.04043, 2019.
  11. An introduction to multi-agent reinforcement learning and review of its application to autonomous mobility. In 2022 IEEE 25th International Conference on Intelligent Transportation Systems (ITSC), pages 1342–1349. IEEE, 2022.
  12. Resq: A residual q function-based approach for multi-agent reinforcement learning value factorization. Advances in Neural Information Processing Systems, 35:5471–5483, 2022.
  13. Qtran: Learning to factorize with transformation for cooperative multi-agent reinforcement learning. In International conference on machine learning, pages 5887–5896. PMLR, 2019.
  14. Qtran++: improved value transformation for cooperative multi-agent reinforcement learning. arXiv preprint arXiv:2006.12010, 2020.
  15. Value-decomposition networks for cooperative multi-agent learning. arXiv preprint arXiv:1706.05296, 2017.
  16. Pettingzoo: Gym for multi-agent reinforcement learning. Advances in Neural Information Processing Systems, 34:15032–15043, 2021.
  17. Greedy-based value representation for optimal coordination in multi-agent reinforcement learning. arXiv preprint arXiv:2112.04454, 2021.
  18. Qplex: Duplex dueling multi-agent q-learning. arXiv preprint arXiv:2008.01062, 2020.
  19. Dual self-awareness value decomposition framework without individual global max for cooperative multi-agent reinforcement learning. arXiv preprint arXiv:2302.02180, 2023.
  20. Fop: Factorizing optimal joint policy of maximum-entropy multi-agent reinforcement learning. In International Conference on Machine Learning, pages 12491–12500. PMLR, 2021.
  21. Learning implicit credit assignment for cooperative multi-agent reinforcement learning. Advances in neural information processing systems, 33:11853–11864, 2020.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 5 tweets and received 1 like.

Upgrade to Pro to view all of the tweets about this paper: