Offline Multi-Agent Reinforcement Learning with Coupled Value Factorization (2306.08900v1)
Abstract: Offline reinforcement learning (RL) that learns policies from offline datasets without environment interaction has received considerable attention in recent years. Compared with the rich literature in the single-agent case, offline multi-agent RL is still a relatively underexplored area. Most existing methods directly apply offline RL ingredients in the multi-agent setting without fully leveraging the decomposable problem structure, leading to less satisfactory performance in complex tasks. We present OMAC, a new offline multi-agent RL algorithm with coupled value factorization. OMAC adopts a coupled value factorization scheme that decomposes the global value function into local and shared components, and also maintains the credit assignment consistency between the state-value and Q-value functions. Moreover, OMAC performs in-sample learning on the decomposed local state-value functions, which implicitly conducts max-Q operation at the local level while avoiding distributional shift caused by evaluating out-of-distribution actions. Based on the comprehensive evaluations of the offline multi-agent StarCraft II micro-management tasks, we demonstrate the superior performance of OMAC over the state-of-the-art offline multi-agent RL methods.
- Pessimistic Bootstrapping for Uncertainty-Driven Offline Reinforcement Learning. In International Conference on Learning Representations.
- Dota 2 with large scale deep reinforcement learning. arXiv preprint arXiv:1912.06680 (2019).
- Offline rl without off-policy evaluation. Advances in Neural Information Processing Systems 34 (2021), 4933–4946.
- Duncan S Callaway and Ian A Hiskens. 2010. Achieving controllability of electric loads. Proc. IEEE 99, 1 (2010), 184–199.
- An Overview of Recent Progress in the Study of Distributed Multi-Agent Coordination. IEEE Transactions on Industrial Informatics 9, 1 (2013), 427–438.
- Off-policy deep reinforcement learning without exploration. In International conference on machine learning. PMLR, 2052–2062.
- A survey and critique of multiagent deep reinforcement learning. Autonomous Agents and Multi-Agent Systems 33, 6 (2019), 750–797.
- Offline reinforcement learning with fisher divergence critic regularization. In International Conference on Machine Learning. PMLR, 5774–5783.
- Offline reinforcement learning with implicit q-learning. arXiv preprint arXiv:2110.06169 (2021).
- Landon Kraemer and Bikramjit Banerjee. 2016. Multi-agent reinforcement learning as a rehearsal for decentralized planning. Neurocomputing 190 (2016), 82–94.
- Settling the variance of multi-agent policy gradients. Advances in Neural Information Processing Systems 34 (2021), 13458–13470.
- Stabilizing off-policy q-learning via bootstrapping error reduction. Advances in Neural Information Processing Systems 32 (2019).
- Conservative q-learning for offline reinforcement learning. Advances in Neural Information Processing Systems 33 (2020), 1179–1191.
- Distance-Sensitive Offline Reinforcement Learning. arXiv preprint arXiv:2205.11027 (2022).
- Offline pre-trained multi-agent decision transformer: One big sequence model conquers all starcraftii tasks. arXiv preprint arXiv:2112.02845 (2021).
- Awac: Accelerating online reinforcement learning with offline datasets. arXiv preprint arXiv:2006.09359 (2020).
- Frans A. Oliehoek and Christopher Amato. 2016. A Concise Introduction to Decentralized POMDPs.
- Optimal and approximate Q-value functions for decentralized POMDPs. Journal of Artificial Intelligence Research 32, 1 (2008), 289–353.
- Plan better amid conservatism: Offline multi-agent reinforcement learning with actor rectification. In International Conference on Machine Learning. PMLR, 17221–17237.
- Advantage-weighted regression: Simple and scalable off-policy reinforcement learning. arXiv preprint arXiv:1910.00177 (2019).
- QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning. In International Conference on Machine Learning. 4292–4301.
- The StarCraft Multi-Agent Challenge. In Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems. 2186–2188.
- QTRAN: Learning to Factorize with Transformation for Cooperative Multi-Agent Reinforcement Learning.. In International Conference on Machine Learning. 5887–5896.
- Value-Decomposition Networks For Cooperative Multi-Agent Learning. arXiv preprint arXiv:1706.05296 (2017).
- Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 575, 7782 (2019), 350–354.
- Qplex: Duplex dueling multi-agent q-learning. arXiv preprint arXiv:2008.01062 (2020).
- Off-policy multi-agent decomposed policy gradients. arXiv preprint arXiv:2007.12322 (2020).
- Behavior regularized offline reinforcement learning. arXiv preprint arXiv:1911.11361 (2019).
- Uncertainty weighted actor-critic for offline reinforcement learning. arXiv preprint arXiv:2105.08140 (2021).
- A Policy-Guided Imitation Approach for Offline Reinforcement Learning. In Advances in Neural Information Processing Systems.
- Constraints penalized q-learning for safe offline reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36. 8753–8760.
- Believe what you see: Implicit constraint approach for offline multi-agent reinforcement learning. Advances in Neural Information Processing Systems 34 (2021), 10299–10312.
- A Multi-Agent Framework for Packet Routing in Wireless Sensor Networks. Sensors 15, 5 (2015), 10026–10047.
- The Surprising Effectiveness of MAPPO in Cooperative, Multi-Agent Games. arXiv: Learning (2021).
- Mopo: Model-based offline policy optimization. Advances in Neural Information Processing Systems 33 (2020), 14129–14142.
- DeepThermal: Combustion Optimization for Thermal Power Generating Units Using Offline Reinforcement Learning. In Proceedings of the AAAI Conference on Artificial Intelligence.
- Model-based offline planning with trajectory pruning. In Proceedings of the 31st International Joint Conference on Artificial Intelligence and the 25th European Conference on Artificial Intelligence, IJCAI-ECAI 2022.
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.