Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 31 tok/s
Gemini 2.5 Pro 50 tok/s Pro
GPT-5 Medium 11 tok/s Pro
GPT-5 High 9 tok/s Pro
GPT-4o 77 tok/s Pro
Kimi K2 198 tok/s Pro
GPT OSS 120B 463 tok/s Pro
Claude Sonnet 4 36 tok/s Pro
2000 character limit reached

Offline Multi-Agent Reinforcement Learning with Coupled Value Factorization (2306.08900v1)

Published 15 Jun 2023 in cs.LG and cs.MA

Abstract: Offline reinforcement learning (RL) that learns policies from offline datasets without environment interaction has received considerable attention in recent years. Compared with the rich literature in the single-agent case, offline multi-agent RL is still a relatively underexplored area. Most existing methods directly apply offline RL ingredients in the multi-agent setting without fully leveraging the decomposable problem structure, leading to less satisfactory performance in complex tasks. We present OMAC, a new offline multi-agent RL algorithm with coupled value factorization. OMAC adopts a coupled value factorization scheme that decomposes the global value function into local and shared components, and also maintains the credit assignment consistency between the state-value and Q-value functions. Moreover, OMAC performs in-sample learning on the decomposed local state-value functions, which implicitly conducts max-Q operation at the local level while avoiding distributional shift caused by evaluating out-of-distribution actions. Based on the comprehensive evaluations of the offline multi-agent StarCraft II micro-management tasks, we demonstrate the superior performance of OMAC over the state-of-the-art offline multi-agent RL methods.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (37)
  1. Pessimistic Bootstrapping for Uncertainty-Driven Offline Reinforcement Learning. In International Conference on Learning Representations.
  2. Dota 2 with large scale deep reinforcement learning. arXiv preprint arXiv:1912.06680 (2019).
  3. Offline rl without off-policy evaluation. Advances in Neural Information Processing Systems 34 (2021), 4933–4946.
  4. Duncan S Callaway and Ian A Hiskens. 2010. Achieving controllability of electric loads. Proc. IEEE 99, 1 (2010), 184–199.
  5. An Overview of Recent Progress in the Study of Distributed Multi-Agent Coordination. IEEE Transactions on Industrial Informatics 9, 1 (2013), 427–438.
  6. Off-policy deep reinforcement learning without exploration. In International conference on machine learning. PMLR, 2052–2062.
  7. A survey and critique of multiagent deep reinforcement learning. Autonomous Agents and Multi-Agent Systems 33, 6 (2019), 750–797.
  8. Offline reinforcement learning with fisher divergence critic regularization. In International Conference on Machine Learning. PMLR, 5774–5783.
  9. Offline reinforcement learning with implicit q-learning. arXiv preprint arXiv:2110.06169 (2021).
  10. Landon Kraemer and Bikramjit Banerjee. 2016. Multi-agent reinforcement learning as a rehearsal for decentralized planning. Neurocomputing 190 (2016), 82–94.
  11. Settling the variance of multi-agent policy gradients. Advances in Neural Information Processing Systems 34 (2021), 13458–13470.
  12. Stabilizing off-policy q-learning via bootstrapping error reduction. Advances in Neural Information Processing Systems 32 (2019).
  13. Conservative q-learning for offline reinforcement learning. Advances in Neural Information Processing Systems 33 (2020), 1179–1191.
  14. Distance-Sensitive Offline Reinforcement Learning. arXiv preprint arXiv:2205.11027 (2022).
  15. Offline pre-trained multi-agent decision transformer: One big sequence model conquers all starcraftii tasks. arXiv preprint arXiv:2112.02845 (2021).
  16. Awac: Accelerating online reinforcement learning with offline datasets. arXiv preprint arXiv:2006.09359 (2020).
  17. Frans A. Oliehoek and Christopher Amato. 2016. A Concise Introduction to Decentralized POMDPs.
  18. Optimal and approximate Q-value functions for decentralized POMDPs. Journal of Artificial Intelligence Research 32, 1 (2008), 289–353.
  19. Plan better amid conservatism: Offline multi-agent reinforcement learning with actor rectification. In International Conference on Machine Learning. PMLR, 17221–17237.
  20. Advantage-weighted regression: Simple and scalable off-policy reinforcement learning. arXiv preprint arXiv:1910.00177 (2019).
  21. QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning. In International Conference on Machine Learning. 4292–4301.
  22. The StarCraft Multi-Agent Challenge. In Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems. 2186–2188.
  23. QTRAN: Learning to Factorize with Transformation for Cooperative Multi-Agent Reinforcement Learning.. In International Conference on Machine Learning. 5887–5896.
  24. Value-Decomposition Networks For Cooperative Multi-Agent Learning. arXiv preprint arXiv:1706.05296 (2017).
  25. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 575, 7782 (2019), 350–354.
  26. Qplex: Duplex dueling multi-agent q-learning. arXiv preprint arXiv:2008.01062 (2020).
  27. Off-policy multi-agent decomposed policy gradients. arXiv preprint arXiv:2007.12322 (2020).
  28. Behavior regularized offline reinforcement learning. arXiv preprint arXiv:1911.11361 (2019).
  29. Uncertainty weighted actor-critic for offline reinforcement learning. arXiv preprint arXiv:2105.08140 (2021).
  30. A Policy-Guided Imitation Approach for Offline Reinforcement Learning. In Advances in Neural Information Processing Systems.
  31. Constraints penalized q-learning for safe offline reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36. 8753–8760.
  32. Believe what you see: Implicit constraint approach for offline multi-agent reinforcement learning. Advances in Neural Information Processing Systems 34 (2021), 10299–10312.
  33. A Multi-Agent Framework for Packet Routing in Wireless Sensor Networks. Sensors 15, 5 (2015), 10026–10047.
  34. The Surprising Effectiveness of MAPPO in Cooperative, Multi-Agent Games. arXiv: Learning (2021).
  35. Mopo: Model-based offline policy optimization. Advances in Neural Information Processing Systems 33 (2020), 14129–14142.
  36. DeepThermal: Combustion Optimization for Thermal Power Generating Units Using Offline Reinforcement Learning. In Proceedings of the AAAI Conference on Artificial Intelligence.
  37. Model-based offline planning with trajectory pruning. In Proceedings of the 31st International Joint Conference on Artificial Intelligence and the 25th European Conference on Artificial Intelligence, IJCAI-ECAI 2022.
Citations (3)

Summary

We haven't generated a summary for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Lightbulb On Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube