Factored Online Planning in Many-Agent POMDPs (2312.11434v3)
Abstract: In centralized multi-agent systems, often modeled as multi-agent partially observable Markov decision processes (MPOMDPs), the action and observation spaces grow exponentially with the number of agents, making the value and belief estimation of single-agent online planning ineffective. Prior work partially tackles value estimation by exploiting the inherent structure of multi-agent settings via so-called coordination graphs. Additionally, belief estimation methods have been improved by incorporating the likelihood of observations into the approximation. However, the challenges of value estimation and belief estimation have only been tackled individually, which prevents existing methods from scaling to settings with many agents. Therefore, we address these challenges simultaneously. First, we introduce weighted particle filtering to a sample-based online planner for MPOMDPs. Second, we present a scalable approximation of the belief. Third, we bring an approach that exploits the typical locality of agent interactions to novel online planning algorithms for MPOMDPs operating on a so-called sparse particle filter tree. Our experimental evaluation against several state-of-the-art baselines shows that our methods (1) are competitive in settings with only a few agents and (2) improve over the baselines in the presence of many agents.
- Safe Policy Synthesis in Multi-Agent POMDPs via Discrete-Time Barrier Functions. In CDC, 4797–4803. IEEE.
- Scalable Planning and Learning for Multiagent POMDPs. In AAAI, 1995–2002. AAAI Press.
- Finite-time Analysis of the Multiarmed Bandit Problem. Mach. Learn., 47(2-3): 235–256.
- A Survey of Monte Carlo Tree Search Methods. IEEE Trans. Comput. Intell. AI Games, 4(1): 1–43.
- HyP-DESPOT: A hybrid parallel algorithm for online planning under uncertainty. Int. J. Robotics Res., 40(2-3).
- Decentralized Delayed-State Information Filter (DDSIF): A new approach for cooperative decentralized tracking. Robotics Auton. Syst., 59(6): 376–388.
- Acting Optimally in Partially Observable Stochastic Domains. In AAAI, 1023–1028. AAAI Press / The MIT Press.
- Scalable Online Planning for Multi-Agent MDPs. J. Artif. Intell. Res., 73: 821–846.
- The divergence of reinforcement learning algorithms with value-iteration and function approximation. In IJCNN, 1–8. IEEE.
- Information Particle Filter Tree: An Online Algorithm for POMDPs with Belief-Based Rewards on Continuous Domains. In ICML, volume 119 of Proceedings of Machine Learning Research, 3177–3187. PMLR.
- Fox, D. 2001. KLD-Sampling: Adaptive Particle Filters. In NIPS, 713–720. MIT Press.
- DESPOT-Alpha: Online POMDP Planning with Large State and Observation Spaces. In Robotics: Science and Systems.
- Novel approach to nonlinear/non-Gaussian Bayesian state estimation. IEE Proceedings F (Radar and Signal Processing), 140(2): 107–113(6).
- Multiagent Planning with Factored MDPs. In Dietterich, T. G.; Becker, S.; and Ghahramani, Z., eds., Advances in Neural Information Processing Systems 14, 1523–1530. MIT Press.
- Coordinated Reinforcement Learning. In ICML, 227–234. Morgan Kaufmann.
- Context-Specific Multiagent Coordination and Planning with Factored MDPs. In AAAI/IAAI, 253–259. AAAI Press / The MIT Press.
- Planning and Acting in Partially Observable Stochastic Domains. Artif. Intell., 101(1-2): 99–134.
- Bayesian Reinforcement Learning in Factored POMDPs. In AAMAS, 7–15. International Foundation for Autonomous Agents and Multiagent Systems.
- A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes. Mach. Learn., 49(2-3): 193–208.
- Decision Making Under Uncertainty: Theory and Application. MIT Lincoln Laboratory Series. MIT Press. ISBN 978-0-262-02925-4.
- Bandit Based Monte-Carlo Planning. In ECML, volume 4212 of LNCS, 282–293. Springer.
- Utile Coordination: Learning Interdependencies Among Cooperative Agents. In Proceedings of the 2005 IEEE Symposium on Computational Intelligence and Games (CIG05). IEEE.
- Using the Max-Plus Algorithm for Multiagent Decision Making in Coordination Graphs. In Bredenfeld, A.; Jacoff, A.; Noda, I.; and Takahashi, Y., eds., RoboCup 2005: Robot Soccer World Cup IX, volume 4020 of LNCS, 1–12. Springer.
- SARSOP: Efficient Point-Based POMDP Planning by Approximating Optimally Reachable Belief Spaces. In Brock, O.; Trinkle, J.; and Ramos, F., eds., Robotics: Science and Systems IV. The MIT Press.
- Multiagent Reinforcement Learning for Urban Traffic Control Using Coordination Graphs. In ECML/PKDD (1), volume 5211 of LNCS, 656–671. Springer.
- Optimality Guarantees for Particle Belief Approximation of POMDPs. J. Artif. Intell. Res., 77: 1591–1636.
- Sparse Tree Search Optimality Guarantees in POMDPs with Continuous Observation Spaces. In IJCAI, 4135–4142. ijcai.org.
- Loeliger, H. 2004. An introduction to factor graphs. IEEE Signal Process. Mag., 21(1): 28–41.
- Efficient Offline Communication Policies for Factored Multiagent POMDPs. In NIPS, 1917–1925.
- Multiagent POMDPs with asynchronous execution. In AAMAS, 1273–1274. IFAAMAS.
- Networked Distributed POMDPs: A Synergy of Distributed Constraint Optimization and POMDPs. In IJCAI, 1758–1760. Professional Book Center.
- Factored Particles for Scalable Monitoring. In UAI, 370–377. Morgan Kaufmann.
- Multi-Objective Multi-Agent Planning for Jointly Discovering and Tracking Mobile Objects. In AAAI, 7227–7235. AAAI Press.
- A Concise Introduction to Decentralized POMDPs. Springer Briefs in Intelligent Systems. Springer. ISBN 978-3-319-28927-4.
- Tree-Based Solution Methods for Multiagent POMDPs with Delayed Communication. In AAAI, 1415–1421. AAAI Press.
- Exploiting locality of interaction in factored Dec-POMDPs. In AAMAS (1), 517–524. IFAAMAS.
- Pearl, J. 1989. Probabilistic reasoning in intelligent systems - networks of plausible inference. Morgan Kaufmann.
- The Communicative Multiagent Team Decision Problem: Analyzing Teamwork Theories and Models. J. Artif. Intell. Res., 16: 389–423.
- Langevin and Hamiltonian Based Sequential MCMC for Efficient Bayesian Filtering in High-Dimensional Spaces. IEEE J. Sel. Top. Signal Process., 10(2): 312–327.
- Monte-Carlo Planning in Large POMDPs. In NIPS, 2164–2172. Curran Associates, Inc.
- Heuristic Search Value Iteration for POMDPs. In UAI, 520–527. AUAI Press.
- Spaan, M. T. J. 2012. Partially Observable Markov Decision Processes. In Reinforcement Learning, volume 12 of Adaptation, Learning, and Optimization, 387–414. Springer.
- Multiagent Planning Under Uncertainty with Stochastic Communication Delays. In ICAPS, 338–345. AAAI.
- Online Algorithms for POMDPs with Continuous State, Action, and Observation Spaces. In ICAPS, 259–263. AAAI Press.
- Thrun, S. 1999. Monte Carlo POMDPs. In Solla, S. A.; Leen, T. K.; and Müller, K., eds., Advances in Neural Information Processing Systems 12, 1064–1070. The MIT Press.
- Probabilistic robotics. Intelligent robotics and autonomous agents. MIT Press.
- Vlassis, N. 2009. A Concise Introduction to Multiagent Systems and Distributed Artificial Intelligence. Synthesis Lectures on Artificial Intelligence and Machine Learning. Morgan & Claypool Publishers.
- Anytime algorithms for multiagent decision making using coordination graphs. In SMC (1), 953–957. IEEE.
- Tree consistency and bounds on the performance of the max-product algorithm and its generalizations. Stat. Comput., 14(2): 143–166.
- Autonomous Surveillance Robots: A Decision-Making Framework for Networked Muiltiagent Systems. IEEE Robotics Autom. Mag., 24(3): 52–64.
- Adaptive Online Packing-guided Search for POMDPs. In NeurIPS, 28419–28430.
- Local Advantage Actor-Critic for Robust Multi-Agent Deep Reinforcement Learning. In International Symposium on Multi-Robot and Multi-Agent Systems, MRS 2021, Cambridge, United Kingdom, November 4-5, 2021, 155–163. IEEE.
- DESPOT: Online POMDP Planning with Regularization. J. Artif. Intell. Res., 58: 231–266.
- Centralized Patrolling With Weakly-Coupled Agents Using Monte Carlo Tree Search. IEEE Access, 7: 157293–157302.