A Survey on Game Playing Agents and Large Models: Methods, Applications, and Challenges (2403.10249v1)
Abstract: The swift evolution of Large-scale Models (LMs), either language-focused or multi-modal, has garnered extensive attention in both academy and industry. But despite the surge in interest in this rapidly evolving area, there are scarce systematic reviews on their capabilities and potential in distinct impactful scenarios. This paper endeavours to help bridge this gap, offering a thorough examination of the current landscape of LM usage in regards to complex game playing scenarios and the challenges still open. Here, we seek to systematically review the existing architectures of LM-based Agents (LMAs) for games and summarize their commonalities, challenges, and any other insights. Furthermore, we present our perspective on promising future research avenues for the advancement of LMs in games. We hope to assist researchers in gaining a clear understanding of the field and to generate more interest in this highly impactful research direction. A corresponding resource, continuously updated, can be found in our GitHub repository.
- LLM-Deliberation: Evaluating LLMs with Interactive Multi-Agent Negotiation Games. arXiv:2309.17234, 2023.
- Towards grounded dialogue generation in video game environments. 2023.
- A framework for exploring player perceptions of llm-generated dialogue in commercial video games. In Findings of EMNLP 2023.
- Trevor Ashby and Braden K Webb et al. Personalized quest and dialogue generation in role-playing games: A knowledge graph-and language model-based approach. In CHI, 2023.
- Video pretraining (vpt): Learning to act by watching unlabeled online videos. NeurIPS, 2022.
- The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research, 2013.
- Dota 2 with large scale deep reinforcement learning. arXiv:1912.06680, 2019.
- Groot: Learning to follow instructions by watching gameplay videos. arXiv:2310.08235, 2023.
- GameGPT: Multi-agent collaborative framework for game development. arXiv:2310.08067, 2023.
- AutoAgents: A framework for Automatic Agent Generation. arXiv:2309.17288, 2023.
- Towards end-to-end embodied decision making with multi-modal large language model. In NeurIPS 2023 Workshop, 2023.
- Agentverse: Facilitating multi-agent collaboration and exploring emergent behaviors in agents. arXiv:2308.10848, 2023.
- Neural mechanisms for interacting with a world full of action choices. Annual review of neuroscience, 2010.
- DesignGPT: Multi-agent collaboration in design. arXiv:2311.11591, 2023.
- Yijiang River Dong. COTTAGE: Coherent Text Adventure Games Generation. PhD thesis, University of Pennsylvania, 2023.
- Palm-e: An embodied multimodal language model. arXiv:2303.03378, 2023.
- Improving factuality and reasoning in language models through multiagent debate. arXiv:2305.14325, 2023.
- Minedojo: Building open-ended embodied agents with internet-scale knowledge. NeurIPS, 2022.
- Llama rider: Spurring large language models to explore the open world. arXiv:2310.08922, 2023.
- Improving language model negotiation with self-play and in-context learning from ai feedback. arXiv:2305.10142, 2023.
- Mindagent: Emergent gaming interaction. arXiv:2309.09971, 2023.
- Suspicion-agent: Playing imperfect information games with theory of mind-aware GPT-4. arXiv:2309.17277, 2023.
- Akshat Gupta. Are ChatGPT and GPT-4 Good Poker Players? – A Pre-Flop Analysis. arXiv:2308.12466, 2023.
- MetaGPT: Meta programming for a multi-agent collaborative framework. arXiv:2308.00352, 2023.
- Pokéllmon: A human-parity agent for pokémon battles with large language models. arXiv:2402.01118, 2024.
- Inner monologue: Embodied reasoning through planning with language models. arXiv:2207.05608, 2022.
- Qiuyuan Huang and Jae Sung Park et al. Ark: Augmented reality with knowledge interactive emergent ability. arXiv:2305.00970, 2023.
- Reinforcement learning agents for ubisoft’s roller champions. arXiv:2012.06031, 2020.
- Alphablock: Embodied finetuning for vision-language reasoning in robot manipulation. arXiv:2305.18898, 2023.
- Lyfe agents: Generative agents for low-cost real-time social interactions. arXiv:2310.02172, 2023.
- Llm-based agent society investigation: Collaboration and confrontation in avalon gameplay. arXiv:2310.14985, 2023.
- Guohao Li and Hasan Abed Al Kader Hammoud et al. CAMEL: Communicative agents for ’mind’ exploration of large language model society. In The 37th NeurIPS, 2023.
- Auto mc-reward: Automated dense reward design with large language models for minecraft. arXiv:2312.09238, 2023.
- Assessing logical puzzle solving in large language models: Insights from a minesweeper case study. arXiv:2311.07387, 2023.
- Metaagents: Simulating interactions of human behaviors for llm-based task-oriented coordination via collaborative generative agents. arXiv:2310.06500, 2023.
- Tachikuma: Understading complex interactions with multi-character and novel objects by large language models. arXiv:2307.12573, 2023.
- Steve-1: A generative model for text-to-behavior in minecraft. In NeurIPS 2023 Workshop, 2023.
- Avalonbench: Evaluating llms playing the game of avalon. In NeurIPS 2023 Workshop, 2023.
- Llm-powered hierarchical language agent for real-time human-ai coordination. arXiv:2312.15224, 2024.
- Large language models play starcraft ii: Benchmarks and a chain of summarization approach. arXiv:2312.11865, 2023.
- Meta’s Fundamental AI Research Diplomacy Team. Human-level play in the game of Diplomacy by combining language models with strategic reasoning. Science, 2022.
- The 2003 Report of the IGDA’s Artificial Intelligence Interface Standards Committee. International Game Developers Association (IGDA) Technical Report, 2003.
- Alexander Nareyek and Börje F. Karlsson et al. The 2004 Report of the IGDA’s Artificial Intelligence Interface Standards Committee. IGDA Technical Report, 2004.
- The 2005 Report of the IGDA’s Artificial Intelligence Interface Standards Committee. IGDA Technical Report, 2005.
- Selective perception: Optimizing state descriptions with reinforcement learning for language model actors. arXiv:2307.11922, 2023.
- OpenAI. ChatGPT can now see, hear, and speak, 2023.
- Social simulacra: Creating populated prototypes for social computing systems. In ACM UIST, 2022.
- Generative agents: Interactive simulacra of human behavior. In The 36th UIST, 2023.
- Gorilla: Large language model connected with massive apis. arXiv:2305.15334, 2023.
- diff history for neural language agents. arXiv:2312.07540, 2023.
- Communicative agents for software development. arXiv:2307.07924, 2023.
- Scaling instructable agents across many simulated worlds. Technical Report, 2024.
- Sayplan: Grounding large language models using 3d scene graphs for scalable task planning. arXiv:2307.06135, 2023.
- Visual encoders for data-efficient imitation learning in modern video games. arXiv:2312.02312, 2023.
- Timo Schick and Jane Dwivedi-Yu et al. Toolformer: Language models can teach themselves to use tools. arXiv:2302.04761, 2023.
- Character-LLM: A trainable agent for role-playing. In EMNLP, 2023.
- MarioGPT: Open-Ended Text2Level Generation through Large Language Models. arXiv:2302.05981, 2023.
- Large language models are pretty good zero-shot video game bug detectors. arXiv:2210.02506, 2022.
- Glitchbench: Can large multimodal models detect video game glitches? arXiv:2312.05291, 2023.
- Searching bug instances in gameplay video repositories. IEEE Transactions on Games, 2024.
- True knowledge comes from practice: Aligning large language models with embodied environments via reinforcement learning. In ICLR, 2024.
- Towards General Computer Control: A Multimodal Agent for Red Dead Redemption II as a Case Study. arXiv:2403.03186, 2024.
- Can large language models play text games well? Current state-of-the-art and open questions. arXiv:2304.02868, 2023.
- Voyager: An open-ended embodied agent with large language models. arXiv:2305.16291, 2023.
- Apollo’s oracle: Retrieval-augmented reasoning in multi-agent debates. arXiv:2312.04854, 2023.
- Avalon’s game of thoughts: Battle against deception through recursive contemplation. arXiv:2310.01320, 2023.
- Self-consistency improves chain of thought reasoning in language models. arXiv:2203.11171, 2023.
- Open-world story generation with structured knowledge enhancement: A comprehensive survey. Neurocomputing, 2023.
- Rolellm: Benchmarking, eliciting, and enhancing role-playing abilities of large language models. arXiv:2310.00746, 2023.
- Describe, explain, plan and select: Interactive planning with llms enables open-world multi-task agents. In The 37th NeurIPS, 2023.
- Jarvis-1: Open-world multi-task agents with memory-augmented multimodal language models. arXiv:2311.05997, 2023.
- Honor of kings arena: An environment for generalization in competitive reinforcement learning. The 35th NeurIPS.
- Chain-of-thought prompting elicits reasoning in large language models. NeurIPS, 35:24824–24837, 2022.
- Visual ChatGPT: Talking, drawing and editing with visual foundation models. arXiv:2303.04671, 2023.
- Deciphering digital detectives: Understanding llm behaviors and capabilities in multi-agent mystery games. arXiv:2312.00746, 2023.
- Tidybot: Personalized robot assistance with large language models. arXiv:2305.05658, 2023.
- Autogen: Enabling next-gen llm applications via multi-agent conversation framework. arXiv:2308.08155, 2023.
- SPRING: Studying papers and reasoning to play games. In The 37th NeurIPS, 2023.
- Smartplay: A benchmark for llms as intelligent agents. 2023.
- Embodied task planning with large language models. arXiv:2307.01848, 2023.
- The rise and potential of large language model based agents: A survey. arXiv:2309.07864, 2023.
- Robotic skill acquisition via instruction augmentation with vision-language models. RSS, 2023.
- Exploring large language models for communication games: An empirical study on werewolf. arXiv:2309.04658, 2023.
- Language agents with reinforcement learning for strategic play in the werewolf game. arXiv:2310.18940, 2023.
- Octopus: Embodied vision-language programmer from environmental feedback. arXiv:2310.08588, 2023.
- Skill reinforcement learning and planning for open-world long-horizon tasks. In NeurIPS 2023 Foundation Models for Decision Making Workshop, 2023.
- Building open-ended embodied agent via language-policy bidirectional adaptation. arXiv:2401.00006, 2023.
- Proagent: Building proactive cooperative ai with large language models. AAAI 2024.
- Building cooperative embodied agents modularly with large language models. In NeurIPS 2023 Foundation Models for Decision Making Workshop.
- Sprint: Scalable policy pre-training via language instruction relabeling. arXiv:2306.11886, 2023.
- Steve-eye: Equipping llm-based embodied agents with visual perception in open worlds. arXiv:2310.13255, 2023.
- Wangchunshu Zhou and Yuchen Eleanor Jiang et al. Agents: An open-source framework for autonomous language agents. arXiv:2309.07870, 2023.
- Calypso: Llms as dungeon master’s assistants. In AIIDE 2023, 2023.
- Ghost in the minecraft: Generally capable agents for open-world environments via large language models with text-based knowledge and memory. 2023.