Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 161 tok/s
Gemini 2.5 Pro 48 tok/s Pro
GPT-5 Medium 32 tok/s Pro
GPT-5 High 37 tok/s Pro
GPT-4o 76 tok/s Pro
Kimi K2 216 tok/s Pro
GPT OSS 120B 435 tok/s Pro
Claude Sonnet 4.5 35 tok/s Pro
2000 character limit reached

LLM-Coordination: Evaluating and Analyzing Multi-agent Coordination Abilities in Large Language Models (2310.03903v3)

Published 5 Oct 2023 in cs.CL and cs.MA

Abstract: LLMs have demonstrated emergent common-sense reasoning and Theory of Mind (ToM) capabilities, making them promising candidates for developing coordination agents. This study introduces the LLM-Coordination Benchmark, a novel benchmark for analyzing LLMs in the context of Pure Coordination Settings, where agents must cooperate to maximize gains. Our benchmark evaluates LLMs through two distinct tasks. The first is Agentic Coordination, where LLMs act as proactive participants in four pure coordination games. The second is Coordination Question Answering (CoordQA), which tests LLMs on 198 multiple-choice questions across these games to evaluate three key abilities: Environment Comprehension, ToM Reasoning, and Joint Planning. Results from Agentic Coordination experiments reveal that LLM-Agents excel in multi-agent coordination settings where decision-making primarily relies on environmental variables but face challenges in scenarios requiring active consideration of partners' beliefs and intentions. The CoordQA experiments further highlight significant room for improvement in LLMs' Theory of Mind reasoning and joint planning capabilities. Zero-Shot Coordination (ZSC) experiments in the Agentic Coordination setting demonstrate that LLM agents, unlike RL methods, exhibit robustness to unseen partners. These findings indicate the potential of LLMs as Agents in pure coordination setups and underscore areas for improvement. Code Available at https://github.com/eric-ai-lab/llm_coordination.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (30)
  1. The hanabi challenge: A new frontier for ai research. Artificial Intelligence, 280:103216, 2020. ISSN 0004-3702. doi: https://doi.org/10.1016/j.artint.2019.103216. URL https://www.sciencedirect.com/science/article/pii/S0004370219300116.
  2. On the Utility of Learning about Humans for Human-AI Coordination. Curran Associates Inc., Red Hook, NY, USA, 2019a.
  3. overcooked_ai. https://github.com/HumanCompatibleAI/overcooked_ai/tree/master, 2019b.
  4. Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality, March 2023. URL https://lmsys.org/blog/2023-03-30-vicuna/.
  5. Threedworld: A platform for interactive multi-modal physical simulation, 2021.
  6. Language models as zero-shot planners: Extracting actionable knowledge for embodied agents. arXiv preprint arXiv:2201.07207, 2022.
  7. Population based training of neural networks, 2017.
  8. Two body problem: Collaborative visual task completion. In CVPR, 2019. first two authors contributed equally.
  9. A cordial sync: Going beyond marginal policies for multi-agent embodied tasks. In ECCV, 2020. first two authors contributed equally.
  10. Michal Kosinski. Theory of mind might have spontaneously emerged in large language models, 2023.
  11. Cooperative open-ended learning framework for zero-shot coordination. In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett (eds.), International Conference on Machine Learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA, volume 202 of Proceedings of Machine Learning Research, pp. 20470–20484. PMLR, 2023. URL https://proceedings.mlr.press/v202/li23au.html.
  12. Code as policies: Language model programs for embodied control. In arXiv preprint arXiv:2209.07753, 2022.
  13. Pecan: Leveraging policy ensemble for context-aware zero-shot human-ai coordination. In Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems, AAMAS ’23, pp.  679–688, Richland, SC, 2023. International Foundation for Autonomous Agents and Multiagent Systems. ISBN 9781450394321.
  14. Multi-agent actor-critic for mixed cooperative-competitive environments. In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, pp.  6382–6393, Red Hook, NY, USA, 2017. Curran Associates Inc. ISBN 9781510860964.
  15. Roco: Dialectic multi-robot collaboration with large language models, 2023.
  16. OpenAI. Gpt-4 technical report, 2023.
  17. Training language models to follow instructions with human feedback, 2022.
  18. Generative agents: Interactive simulacra of human behavior, 2023.
  19. Watch-and-help: A challenge for social perception and human-{ai} collaboration. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=w_7JMpGZRh0.
  20. Planning with large language models via corrective re-prompting, 2022.
  21. Proximal policy optimization algorithms, 2017.
  22. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:2212.04088, 2022.
  23. Collaborating with humans without human data. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan (eds.), Advances in Neural Information Processing Systems, volume 34, pp.  14502–14515. Curran Associates, Inc., 2021. URL https://proceedings.neurips.cc/paper_files/paper/2021/file/797134c3e42371bb4979a462eb2f042a-Paper.pdf.
  24. Voyager: An open-ended embodied agent with large language models, 2023.
  25. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837, 2022.
  26. Too many cooks: Bayesian inference for coordinating multi-agent collaboration. Topics in Cognitive Science, 13(2):414–432, 2021. doi: https://doi.org/10.1111/tops.12525. URL https://onlinelibrary.wiley.com/doi/abs/10.1111/tops.12525.
  27. Spring: Gpt-4 out-performs rl algorithms by studying papers and reasoning, 2023.
  28. Learning zero-shot cooperation with humans, assuming humans are biased. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net, 2023. URL https://openreview.net/pdf?id=TrwE8l9aJzs.
  29. Building cooperative embodied agents modularly with large language models, 2023.
  30. Maximum entropy population-based training for zero-shot human-ai coordination. Proceedings of the AAAI Conference on Artificial Intelligence, 37(5):6145–6153, Jun. 2023. doi: 10.1609/aaai.v37i5.25758. URL https://ojs.aaai.org/index.php/AAAI/article/view/25758.
Citations (5)

Summary

  • The paper presents the LLM-Coordination benchmark showing that LLMs using the CAC framework achieve competitive coordination, particularly in zero-shot scenarios.
  • It employs a novel Cognitive Architecture for Coordination (CAC) with memory, reasoning, and grounding modules to convert game states into actionable insights.
  • The evaluation reveals strong environment comprehension but indicates a need for improved Theory of Mind reasoning and joint planning in LLMs.

LLM-Coordination: Evaluating Multi-agent Coordination Abilities in LLMs

Introduction

The paper introduces the LLM-Coordination Benchmark, aimed at evaluating the multi-agent coordination abilities of LLMs within the context of Pure Coordination Games. These games, which include scenarios such as cooperative card games and team-based strategy tasks, require agents to align their actions precisely for optimal results. The benchmark is specifically designed to assess LLM capabilities in two key areas: Agentic Coordination and Coordination Question Answering (QA). Figure 1

Figure 1: The LLM Coordination Benchmark consists of two tasks: Agentic Coordination to paper the ability of LLMs to act, and Coordination QA to paper the ability of LLMs to reason.

Cognitive Architecture for Coordination (CAC)

A novel Cognitive Architecture for Coordination (CAC) framework is proposed, enabling LLMs to engage in multi-agent coordination tasks as plug-and-play modules. The CAC is structured into three core components: Memory, Reasoning, and Grounding. These elements allow for effective interaction with coordination games, translating states into text formats utilizing LLM-based agents for improved coordination abilities. Figure 2

Figure 2: Cognitive Architecture for Coordination (CAC). This framework is segmented into three key components—Memory, Grounding, and Reasoning.

Agentic Coordination

In the Agentic Coordination task, LLMs participate directly in coordination games by taking actions and reacting to dynamic game states. For this purpose, the CAC framework facilitates end-to-end interaction with the game environment, leveraging memory modules and reasoning capabilities inherent in LLMs. Through experimentation, it has been observed that LLM agents demonstrate a remarkable understanding of game objectives and can generate coherent action strategies when interfaced with the CAC system.

Zero-Shot Coordination

One of the critical insights revealed by this benchmark is the LLMs' robustness in zero-shot coordination scenarios. Unlike reinforcement learning (RL) methods that struggle with unseen partners due to overfitting in self-play scenarios, LLMs maintain cohesion and adaptability. The studies indicate that LLMs, particularly when powered by models like GPT-4-turbo, achieve competitive performance without specialized training or exposure to specific game examples.

Coordination QA

The Coordination QA suite delves deeper into evaluating LLMs' reasoning in coordination games through targeted questions focusing on three cognitive areas: Environment Comprehension (EC), Theory of Mind (ToM) Reasoning, and Joint Planning (JP). The benchmark results showcase that while LLMs are proficient in environment-related queries, there remains significant room for improvement in ToM Reasoning and more complex Joint Planning. Figure 3

Figure 3: Comparative Performance of LLMs in Three Cognitive Dimensions. The graphs display the accuracy of each LLM in EC, ToM Reasoning, and JP, plotted against the model's number of parameters.

Experimental Results and Analysis

The paper reports competitive results in the Agentic Coordination tasks, with LLMs showing comparable performance to the state-of-the-art RL methods in games that emphasize common-sense reasoning. Nevertheless, in games requiring advanced ToM reasoning, LLMs were found to be less effective, indicating a potential research avenue for enhancing LLM reasoning capabilities. The implementation of auxiliary ToM inference and verification mechanisms within the CAC further improves coordination reliability and partner adaptability.

Conclusion

The LLM-Coordination Benchmark provides an innovative framework for assessing the multi-agent coordination skills of LLMs, outlining their current strengths and identifying areas requiring enhancement, particularly in reasoning and planning abilities. The introduction of the CAC framework showcases the potential of LLM agents as robust alternatives to conventional RL methods, emphasizing the importance of intuitive reasoning and environmental adaptability. Future developments could explore augmenting LLMs with more sophisticated reasoning modules and enhancing the integration of theory of mind capabilities to bring them closer to human-like interaction proficiencies.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 7 tweets and received 207 likes.

Upgrade to Pro to view all of the tweets about this paper: