Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
8 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

LLM-Coordination: Evaluating and Analyzing Multi-agent Coordination Abilities in Large Language Models (2310.03903v3)

Published 5 Oct 2023 in cs.CL and cs.MA

Abstract: LLMs have demonstrated emergent common-sense reasoning and Theory of Mind (ToM) capabilities, making them promising candidates for developing coordination agents. This study introduces the LLM-Coordination Benchmark, a novel benchmark for analyzing LLMs in the context of Pure Coordination Settings, where agents must cooperate to maximize gains. Our benchmark evaluates LLMs through two distinct tasks. The first is Agentic Coordination, where LLMs act as proactive participants in four pure coordination games. The second is Coordination Question Answering (CoordQA), which tests LLMs on 198 multiple-choice questions across these games to evaluate three key abilities: Environment Comprehension, ToM Reasoning, and Joint Planning. Results from Agentic Coordination experiments reveal that LLM-Agents excel in multi-agent coordination settings where decision-making primarily relies on environmental variables but face challenges in scenarios requiring active consideration of partners' beliefs and intentions. The CoordQA experiments further highlight significant room for improvement in LLMs' Theory of Mind reasoning and joint planning capabilities. Zero-Shot Coordination (ZSC) experiments in the Agentic Coordination setting demonstrate that LLM agents, unlike RL methods, exhibit robustness to unseen partners. These findings indicate the potential of LLMs as Agents in pure coordination setups and underscore areas for improvement. Code Available at https://github.com/eric-ai-lab/LLM_coordination.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (30)
  1. The hanabi challenge: A new frontier for ai research. Artificial Intelligence, 280:103216, 2020. ISSN 0004-3702. doi: https://doi.org/10.1016/j.artint.2019.103216. URL https://www.sciencedirect.com/science/article/pii/S0004370219300116.
  2. On the Utility of Learning about Humans for Human-AI Coordination. Curran Associates Inc., Red Hook, NY, USA, 2019a.
  3. overcooked_ai. https://github.com/HumanCompatibleAI/overcooked_ai/tree/master, 2019b.
  4. Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality, March 2023. URL https://lmsys.org/blog/2023-03-30-vicuna/.
  5. Threedworld: A platform for interactive multi-modal physical simulation, 2021.
  6. Language models as zero-shot planners: Extracting actionable knowledge for embodied agents. arXiv preprint arXiv:2201.07207, 2022.
  7. Population based training of neural networks, 2017.
  8. Two body problem: Collaborative visual task completion. In CVPR, 2019. first two authors contributed equally.
  9. A cordial sync: Going beyond marginal policies for multi-agent embodied tasks. In ECCV, 2020. first two authors contributed equally.
  10. Michal Kosinski. Theory of mind might have spontaneously emerged in large language models, 2023.
  11. Cooperative open-ended learning framework for zero-shot coordination. In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett (eds.), International Conference on Machine Learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA, volume 202 of Proceedings of Machine Learning Research, pp. 20470–20484. PMLR, 2023. URL https://proceedings.mlr.press/v202/li23au.html.
  12. Code as policies: Language model programs for embodied control. In arXiv preprint arXiv:2209.07753, 2022.
  13. Pecan: Leveraging policy ensemble for context-aware zero-shot human-ai coordination. In Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems, AAMAS ’23, pp.  679–688, Richland, SC, 2023. International Foundation for Autonomous Agents and Multiagent Systems. ISBN 9781450394321.
  14. Multi-agent actor-critic for mixed cooperative-competitive environments. In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, pp.  6382–6393, Red Hook, NY, USA, 2017. Curran Associates Inc. ISBN 9781510860964.
  15. Roco: Dialectic multi-robot collaboration with large language models, 2023.
  16. OpenAI. Gpt-4 technical report, 2023.
  17. Training language models to follow instructions with human feedback, 2022.
  18. Generative agents: Interactive simulacra of human behavior, 2023.
  19. Watch-and-help: A challenge for social perception and human-{ai} collaboration. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=w_7JMpGZRh0.
  20. Planning with large language models via corrective re-prompting, 2022.
  21. Proximal policy optimization algorithms, 2017.
  22. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:2212.04088, 2022.
  23. Collaborating with humans without human data. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan (eds.), Advances in Neural Information Processing Systems, volume 34, pp.  14502–14515. Curran Associates, Inc., 2021. URL https://proceedings.neurips.cc/paper_files/paper/2021/file/797134c3e42371bb4979a462eb2f042a-Paper.pdf.
  24. Voyager: An open-ended embodied agent with large language models, 2023.
  25. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837, 2022.
  26. Too many cooks: Bayesian inference for coordinating multi-agent collaboration. Topics in Cognitive Science, 13(2):414–432, 2021. doi: https://doi.org/10.1111/tops.12525. URL https://onlinelibrary.wiley.com/doi/abs/10.1111/tops.12525.
  27. Spring: Gpt-4 out-performs rl algorithms by studying papers and reasoning, 2023.
  28. Learning zero-shot cooperation with humans, assuming humans are biased. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net, 2023. URL https://openreview.net/pdf?id=TrwE8l9aJzs.
  29. Building cooperative embodied agents modularly with large language models, 2023.
  30. Maximum entropy population-based training for zero-shot human-ai coordination. Proceedings of the AAAI Conference on Artificial Intelligence, 37(5):6145–6153, Jun. 2023. doi: 10.1609/aaai.v37i5.25758. URL https://ojs.aaai.org/index.php/AAAI/article/view/25758.
Citations (5)

Summary

  • The paper introduces the LLM-Co framework that enables GPT-4 to achieve near-human Theory of Mind in multi-agent coordination games.
  • It translates game rules into textual formats to support robust communication and sustained coordination without additional fine-tuning.
  • The study evaluates performance across diverse game environments, showcasing adaptive partner alignment and explicit assistance capabilities.

Insights into Multi-Agent Coordination with LLMs

This paper investigates the potential of LLMs in facilitating multi-agent coordination, a critical component of collaborative artificial intelligence applications. The authors present an LLM-Coordinated Framework (LLM-Co) as a method for enabling LLMs to engage effectively in coordination games. They explore five pertinent aspects of coordination: Theory of Mind (ToM), Situated Reasoning, Sustained Coordination, Robustness to Partners, and Explicit Assistance. This paper evaluates LLMs in various game environments, highlighting their strengths and limitations.

LLM-Coordination Framework

The LLM-Co Framework is designed to enable LLMs, like GPT-4, to interact and perform tasks in dynamic multi-agent game environments. It provides a structured approach by translating game details and rules into a textual format that LLMs can process effectively. The framework supports continuous gameplay across environments by helping LLMs infer actionable steps based on the current game state and the feasible actions available.

Game Environments and Evaluations

The evaluations were conducted in three game environments: Collab Escape, Collab Capture, and Overcooked-AI. Each environment presents unique challenges requiring agents to display Theory of Mind, sustained coordination over extended tasks, and the ability to assist explicitly.

  1. Theory of Mind and Situated Reasoning: The paper introduced an LLM-ToM-Reasoning Test Set to measure the ToM and situated reasoning capabilities of LLMs. It was observed that GPT-4 outperforms others, approaching near-human reasoning levels, demonstrating its capacity to accurately predict partner intentions.
  2. Sustained Coordination: The LLM-Co agents, particularly those using GPT-4, were capable of sustained coordination, outperforming existing RL-based methods in coordination-heavy tasks without pre-training or task-specific fine-tuning.
  3. Robustness to Partners: LLM-Co agents were evaluated against varied partner types, including RL baselines trained with human data. Results show that they adaptively align with partner behavior without compromising coordination efficiency.
  4. Explicit Assistance: The paper explored scenarios requiring proactive help to enhance joint task completion effectiveness. They introduced specific Overcooked-AI layouts that require explicit assistance, demonstrating the adaptability of LLM-Co agents to these requirements with appropriate directive prompts.

Implications and Future Developments

The positive outcomes from this research indicate a promising direction for using LLMs in collaborative AI agents. They can process complex instructions, adapt to unforeseen partner actions, and execute long-term plans, making them suitable for real-world multi-agent tasks. Future work will likely explore the scalability of such frameworks across diversified agents and environments, potentially integrating real-world variables and constraints.

Conclusion

This paper underscores the utility of LLMs, principally GPT-4, in multi-agent coordination. By developing structured frameworks like LLM-Co and evaluating them against comprehensive scenarios, the research highlights the emergent reasoning capabilities of LLMs in collaborative tasks. These findings lay the groundwork for LLMs to serve as reliable agents in both virtual and real-world applications requiring sophisticated coordination.