A Survey on Large Language Model-Based Game Agents (2404.02039v2)
Abstract: The development of game agents holds a critical role in advancing towards Artificial General Intelligence. The progress of LLMs offers an unprecedented opportunity to evolve and empower game agents with human-like decision-making capabilities in complex computer game environments. This paper provides a comprehensive overview of LLM-based game agents from a holistic viewpoint. First, we introduce the conceptual architecture of LLM-based game agents, centered around three core functional components: memory, reasoning and in/output. Second, we survey existing representative LLM-based game agents documented in the literature with respect to methodologies and adaptation agility across six genres of games, including adventure, communication, competition, cooperation, simulation, and crafting & exploration games. Finally, we present an outlook of future research and development directions in this burgeoning field. A curated list of relevant papers is maintained and made accessible at: https://github.com/git-disl/awesome-LLM-game-agent-papers.
- The development of embodied cognition: Six lessons from babies. Artificial life, 11(1-2):13–29, 2005.
- Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744, 2022.
- Gpt-4 technical report. arXiv preprint arXiv:2303.08774, 2023.
- Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805, 2023.
- Cognitive architectures for language agents. arXiv preprint arXiv:2309.02427, 2023.
- Language modeling is compression. arXiv preprint arXiv:2309.10668, 2023.
- Artificial general intelligence, vol. 2. Springer, 2007.
- Goertzel, B. Artificial general intelligence: concept, state of the art, and future prospects. Journal of Artificial General Intelligence, 5(1):1, 2014.
- IBM. Deep blue. https://www.ibm.com/history/deep-blue.
- Mastering the game of go with deep neural networks and tree search. nature, 529(7587):484–489, 2016.
- Bayes’ bluff: Opponent modelling in poker. arXiv preprint arXiv:1207.1411, 2012.
- The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research, 47:253–279, 2013.
- Starcraft ii: A new challenge for reinforcement learning. arXiv preprint arXiv:1708.04782, 2017.
- Mojang Studios. Minecraft. https://www.minecraft.net/en-us.
- Dota 2. https://www.dota2.com/home.
- Mastering the game of go without human knowledge. nature, 550(7676):354–359, 2017.
- OpenAI. Openai five. https://openai.com/research/openai-five, 2018. Accessed on: yyyy-mm-dd.
- DeepMind. Alphastar: Mastering the real-time strategy game starcraft ii. https://deepmind.google/discover/blog/alphastar-mastering-the-real-time-strategy-game-starcraft-ii, 2019.
- A survey of large language models. arXiv preprint arXiv:2303.18223, 2023.
- A survey on evaluation of large language models. ACM Transactions on Intelligent Systems and Technology, 2023.
- Challenges and applications of large language models. arXiv preprint arXiv:2307.10169, 2023.
- The rise and potential of large language model based agents: A survey. arXiv preprint arXiv:2309.07864, 2023.
- A survey on large language model based autonomous agents. arXiv preprint arXiv:2308.11432, 2023.
- Large language models empowered agent-based modeling and simulation: A survey and perspectives. arXiv preprint arXiv:2312.11970, 2023.
- Large language models and games: A survey and roadmap. arXiv preprint arXiv:2402.18659, 2024.
- Sweetser, P. Large language models and video games: A preliminary scoping review. arXiv preprint arXiv:2403.02613, 2024.
- Infocom. Zork I. http://ifdb.tads.org/viewgame?id=0dbnusxunq7fw5ro, 1980.
- Exploring large language models for communication games: An empirical study on werewolf. arXiv preprint arXiv:2309.04658, 2023.
- Large language models play starcraft ii: Benchmarks and a chain of summarization approach. arXiv preprint arXiv:2312.11865, 2023.
- Pokéllmon: A human-parity agent for pokémon battles with large language models, 2024.
- Cooperative open-ended learning framework for zero-shot coordination. arXiv preprint arXiv:2302.04831, 2023.
- Freeciv-web contributors. Freeciv-web. https://github.com/freeciv/freeciv-web, 2023.
- PrismarineJS. Mineflayer: Create minecraft bots with a powerful, stable, and high level javascript api. https://github.com/PrismarineJS/mineflayer, 2013.
- Towards general computer control: A multimodal agent for red dead redemption ii as a case study. arXiv preprint arXiv:2403.03186, 2024.
- Building cooperative embodied agents modularly with large language models. arXiv preprint arXiv:2307.02485, 2023.
- Llm-planner: Few-shot grounded planning for embodied agents with large language models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2998–3009. 2023.
- Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
- Describe, explain, plan and select: Interactive planning with large language models enables open-world multi-task agents. arXiv preprint arXiv:2302.01560, 2023.
- —. Jarvis-1: Open-world multi-task agents with memory-augmented multimodal language models. arXiv preprint arXiv:2311.05997, 2023.
- Guiding pretraining in reinforcement learning with large language models. In International Conference on Machine Learning, pages 8657–8677. PMLR, 2023.
- Minedojo: Building open-ended embodied agents with internet-scale knowledge. Advances in Neural Information Processing Systems, 35:18343–18362, 2022.
- Clipcap: Clip prefix for image captioning. arXiv preprint arXiv:2111.09734, 2021.
- de Wynter, A. Will gpt-4 run doom? arXiv preprint arXiv:2403.05468, 2024.
- Creative agents: Empowering agents with imagination for creative tasks. arXiv preprint arXiv:2312.02519, 2023.
- Octopus: Embodied vision-language programmer from environmental feedback. arXiv preprint arXiv:2310.08588, 2023.
- A generalist agent. arXiv preprint arXiv:2205.06175, 2022.
- Steve-eye: Equipping llm-based embodied agents with visual perception in open worlds. In The Twelfth International Conference on Learning Representations. 2023.
- Infocom. Zork III. http://ifdb.tads.org/viewgame?id=vrsot1zgy1wfcdru, 1982.
- Scienceworld: Is your agent smarter than a 5th grader? arXiv preprint arXiv:2203.07540, 2022.
- Alfworld: Aligning text and embodied environments for interactive learning. arXiv preprint arXiv:2010.03768, 2020.
- Human-level play in the game of diplomacy by combining language models with strategic reasoning. Science, 378(6624):1067–1074, 2022.
- Avalon’s game of thoughts: Battle against deception through recursive contemplation. arXiv preprint arXiv:2310.01320, 2023.
- Pokergpt: An end-to-end lightweight solver for multi-player texas hold’em via large language model. arXiv preprint arXiv:2401.06781, 2024.
- Suspicion-agent: Playing imperfect information games with theory of mind aware gpt-4. arXiv preprint arXiv:2309.17277, 2023.
- Chessgpt: Bridging policy learning and language modeling. Advances in Neural Information Processing Systems, 36, 2024.
- Civrealm: A learning and reasoning odyssey in civilization for decision-making agents. arXiv preprint arXiv:2401.10568, 2024.
- Human memory: A proposed system and its control processes. In Psychology of learning and motivation, vol. 2, pages 89–195. Elsevier, 1968.
- Extending cognitive architecture with episodic memory. In AAAI, pages 1560–1564. 2007.
- Generative agents: Interactive simulacra of human behavior. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology, pages 1–22. 2023.
- Multi-stage episodic control for strategic exploration in text games. arXiv preprint arXiv:2201.01251, 2022.
- Agent-pro: Learning to evolve via policy-level reflection and optimization. arXiv preprint arXiv:2402.17574, 2024.
- Toward integrating cognitive linguistics and cognitive language processing. In Proceedings of the 14th International Conference on Cognitive Modeling (ICCM). 2016.
- Reflexion: an autonomous agent with dynamic memory and self-reflection. arXiv preprint arXiv:2303.11366, 2023.
- Working, declarative and procedural memory in specific language impairment. cortex, 48(9):1138–1154, 2012.
- Voyager: An open-ended embodied agent with large language models. arXiv preprint arXiv:2305.16291, 2023.
- Ghost in the minecraft: Generally capable agents for open-world enviroments via large language models with text-based knowledge and memory. arXiv preprint arXiv:2305.17144, 2023.
- Hierarchical auto-organizing system for open-ended multi-agent navigation. arXiv preprint arXiv:2403.08282, 2024.
- Non-player character decision-making in computer games. Artificial Intelligence Review, 56(12):14159–14191, 2023.
- Calypso: Llms as dungeon master’s assistants. In Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, vol. 19, pages 380–390. 2023.
- Rolellm: Benchmarking, eliciting, and enhancing role-playing abilities of large language models. arXiv preprint arXiv:2310.00746, 2023.
- Personallm: Investigating the ability of gpt-3.5 to express personality traits and gender differences. arXiv preprint arXiv:2305.02547, 2023.
- Chatharuhi: Reviving anime character in reality via large language model. arXiv preprint arXiv:2308.09597, 2023.
- Lamp: When large language models meet personalization. arXiv preprint arXiv:2304.11406, 2023.
- Camel: Communicative agents for" mind" exploration of large language model society. Advances in Neural Information Processing Systems, 36, 2024.
- An appraisal-based chain-of-emotion architecture for affective language model game agents. arXiv preprint arXiv:2309.05076, 2023.
- Emotionally numb or empathetic? evaluating how llms feel using emotionbench. arXiv preprint arXiv:2308.03656, 2023.
- Charactereval: A chinese benchmark for role-playing conversational agent evaluation. arXiv preprint arXiv:2401.01275, 2024.
- Character-llm: A trainable agent for role-playing. arXiv preprint arXiv:2310.10158, 2023.
- Towards reasoning in large language models: A survey. arXiv preprint arXiv:2212.10403, 2022.
- React: Synergizing reasoning and acting in language models. arXiv preprint arXiv:2210.03629, 2022.
- Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837, 2022.
- Large language models are zero-shot reasoners. Advances in neural information processing systems, 35:22199–22213, 2022.
- Tree of thoughts: Deliberate problem solving with large language models. arXiv preprint arXiv:2305.10601, 2023.
- Self-consistency improves chain of thought reasoning in language models. arXiv preprint arXiv:2203.11171, 2022.
- Graph of thoughts: Solving elaborate problems with large language models. arXiv preprint arXiv:2308.09687, 2023.
- Self-refine: Iterative refinement with self-feedback. Advances in Neural Information Processing Systems, 36, 2024.
- Proagent: Building proactive cooperative ai with large language models. arXiv preprint arXiv:2308.11339, 2023.
- Language models can solve computer tasks. Advances in Neural Information Processing Systems, 36, 2024.
- Large language model-powered smart contract vulnerability detection: New perspectives. arXiv preprint arXiv:2310.01152, 2023.
- Theory of mind. Current biology, 15(17):R644–R645, 2005.
- Kosinski, M. Theory of mind may have spontaneously emerged in large language models. arXiv preprint arXiv:2302.02083, 2023.
- Evaluating multi-agent coordination abilities in large language models. arXiv preprint arXiv:2310.03903, 2023.
- Omni: Open-endedness via models of human notions of interestingness. arXiv preprint arXiv:2306.01711, 2023.
- Spring: Studying papers and reasoning to play games. Advances in Neural Information Processing Systems, 36, 2024.
- Adarefiner: Refining decisions of language models with adaptive feedback, 2023.
- Adapt: As-needed decomposition and planning with language models. arXiv preprint arXiv:2311.05772, 2023.
- Swiftsage: A generative agent with fast and slow thinking for complex interactive tasks. Advances in Neural Information Processing Systems, 36, 2024.
- Language models as zero-shot planners: Extracting actionable knowledge for embodied agents. In International Conference on Machine Learning, pages 9118–9147. PMLR, 2022.
- S-agents: self-organizing agents in open-ended environment. arXiv preprint arXiv:2402.04578, 2024.
- Mindagent: Emergent gaming interaction. arXiv preprint arXiv:2309.09971, 2023.
- Plan4mc: Skill reinforcement learning and planning for open-world minecraft tasks. arXiv preprint arXiv:2303.16563, 2023.
- On the utility of learning about humans for human-ai coordination. Advances in neural information processing systems, 32, 2019.
- Microsoft Research. First textworld problems: The competition using text-based games to advance capabilities of ai agents, 2019.
- Interactive fiction games: A colossal adventure. In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pages 7903–7910. 2020.
- Llm-powered hierarchical language agent for real-time human-ai coordination. arXiv preprint arXiv:2312.15224, 2023.
- Rl-gpt: Integrating reinforcement learning and code-as-policy. arXiv preprint arXiv:2402.19299, 2024.
- Virtualhome: Simulating household activities via programs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 8494–8502. 2018.
- Do as i can, not as i say: Grounding language in robotic affordances. arXiv preprint arXiv:2204.01691, 2022.
- War and peace (waragent): Large language model-based multi-agent simulation of world wars. arXiv preprint arXiv:2311.17227, 2023.
- Language models meet world models: Embodied experiences enhance language models. Advances in neural information processing systems, 36, 2024.
- Llama rider: Spurring large language models to explore the open world. arXiv preprint arXiv:2310.08922, 2023.
- Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems, 35:31199–31212, 2022.
- Grounding large language models in interactive environments with online reinforcement learning. arXiv preprint arXiv:2302.02662, 2023.
- Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
- Enhance reasoning for large language models in the game werewolf. arXiv preprint arXiv:2402.02330, 2024.
- Reward design with language models. arXiv preprint arXiv:2303.00001, 2023.
- Motif: Intrinsic motivation from artificial intelligence feedback. arXiv preprint arXiv:2310.00166, 2023.
- Auto mc-reward: Automated dense reward design with large language models for minecraft. arXiv preprint arXiv:2312.09238, 2023.
- Text2reward: Automated dense reward function generation for reinforcement learning. arXiv preprint arXiv:2309.11489, 2023.
- Eureka: Human-level reward design via coding large language models. arXiv preprint arXiv:2310.12931, 2023.
- Lyfe agents: Generative agents for low-cost real-time social interactions. arXiv preprint arXiv:2310.02172, 2023.
- Hafner, D. Benchmarking the spectrum of agent capabilities. arXiv preprint arXiv:2109.06780, 2021.
- Keep calm and explore: Language models for action generation in text-based games. arXiv preprint arXiv:2010.02903, 2020.
- Can large language models play text games well? current state-of-the-art and open questions. arXiv preprint arXiv:2304.02868, 2023.
- Language agents with reinforcement learning for strategic play in the werewolf game. arXiv preprint arXiv:2310.18940, 2023.
- Avalonbench: Evaluating llms playing the game of avalon. arXiv e-prints, pages arXiv–2310, 2023.
- Cooperation on the fly: Exploring language agents for ad hoc teamwork in the avalon game. arXiv preprint arXiv:2312.17515, 2023.
- What if llms have different world views: Simulating alien civilizations with llm-based agents. arXiv preprint arXiv:2402.13184, 2024.
- Leveraging word guessing games to assess the intelligence of large language models. arXiv preprint arXiv:2310.20499, 2023.
- Gameeval: Evaluating llms on conversational games. arXiv preprint arXiv:2308.10032, 2023.
- Swarmbrain: Embodied agent for real-time strategy game starcraft ii via large language models. arXiv preprint arXiv:2401.17749, 2024.
- Emergent world representations: Exploring a sequence model trained on a synthetic task. arXiv preprint arXiv:2210.13382, 2022.
- Gupta, A. Are chatgpt and gpt-4 good poker players?–a pre-flop analysis. arXiv preprint arXiv:2308.12466, 2023.
- Humanoid agents: Platform for simulating human-like generative agents. arXiv preprint arXiv:2310.05418, 2023.
- Agentsims: An open-source sandbox for large language model evaluation. arXiv preprint arXiv:2308.04026, 2023.
- Regal: Refactoring programs to discover generalizable abstractions. arXiv preprint arXiv:2401.16467, 2024.
- Textworld: A learning environment for text-based games. In Computer Games: 7th Workshop, CGW 2018, Held in Conjunction with the 27th International Conference on Artificial Intelligence, IJCAI 2018, Stockholm, Sweden, July 13, 2018, Revised Selected Papers 7, pages 41–75. Springer, 2019.
- Counting to explore and generalize in text-based games. arXiv preprint arXiv:1806.11525, 2018.
- BBC. The hitchhiker’s guide to the galaxy text adventure: 30th anniversary edition. https://www.bbc.co.uk/programmes/articles/1g84m0sXpnNCv84GpN2PLZG/the-game-30th-anniversary-edition.
- Alfred: A benchmark for interpreting grounded instructions for everyday tasks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10740–10749. 2020.
- Babyai: A platform to study the sample efficiency of grounded language learning. arXiv preprint arXiv:1810.08272, 2018.
- Pre-trained language models as prior knowledge for playing text-based games. arXiv preprint arXiv:2107.08408, 2021.
- Language model-in-the-loop: Data optimal approach to learn-to-recommend actions in text games. arXiv preprint arXiv:2311.07687, 2023.
- Self-imitation learning for action generation in text-based games. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pages 703–726. 2023.
- Graph constrained reinforcement learning for natural language action spaces. arXiv preprint arXiv:2001.08837, 2020.
- Interactive fiction game playing as multi-paragraph reading comprehension with reinforcement learning. arXiv preprint arXiv:2010.02386, 2020.
- Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461, 2019.
- Deciphering digital detectives: Understanding llm behaviors and capabilities in multi-agent mystery games. arXiv preprint arXiv:2312.00746, 2023.
- Wikipedia. Portable game notation. https://en.wikipedia.org/wiki/Portable_Game_Notation, 2023.
- Chess as a testbed for language model state tracking. In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pages 11385–11393. 2022.
- Opt: Open pre-trained transformer language models. arXiv preprint arXiv:2205.01068, 2022.
- Alphaholdem: High-performance artificial intelligence for heads-up no-limit poker via end-to-end reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pages 4689–4697. 2022.
- Too many cooks: Bayesian inference for coordinating multi-agent collaboration. Topics in Cognitive Science, 13(2):414–432, 2021.
- Watch-and-help: A challenge for social perception and human-ai collaboration. arXiv preprint arXiv:2010.09890, 2020.
- The threedworld transport challenge: A visually guided task-and-motion planning benchmark for physically realistic embodied ai. arXiv preprint arXiv:2103.14025, 2021.
- Threedworld: A platform for interactive multi-modal physical simulation. arXiv preprint arXiv:2007.04954, 2020.
- Maximum entropy population-based training for zero-shot human-ai coordination. In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, pages 6145–6153. 2023.
- Significant Gravitas. AutoGPT.
- Ai2-thor: An interactive 3d environment for visual ai. arXiv preprint arXiv:1712.05474, 2017.
- Interactive gibson benchmark: A benchmark for interactive navigation in cluttered environments. IEEE Robotics and Automation Letters, 5(2):713–720, 2020.
- Habitat: A platform for embodied ai research. In Proceedings of the IEEE/CVF international conference on computer vision, pages 9339–9347. 2019.
- Behavior: Benchmark for everyday household activities in virtual, interactive, and ecological environments. In Conference on robot learning, pages 477–490. PMLR, 2022.
- Behavior-1k: A benchmark for embodied ai with 1,000 everyday activities and realistic simulation. In Conference on Robot Learning, pages 80–93. PMLR, 2023.
- Automatic goal generation for reinforcement learning agents. In International conference on machine learning, pages 1515–1528. PMLR, 2018.
- Self-instruct: Aligning language model with self generated instructions. arXiv preprint arXiv:2212.10560, 2022.
- Toolkengpt: Augmenting frozen language models with massive tools via tool embeddings. Advances in neural information processing systems, 36, 2024.
- Climbing towards nlu: On meaning, form, and understanding in the age of data. In Proceedings of the 58th annual meeting of the association for computational linguistics, pages 5185–5198. 2020.
- On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258, 2021.
- Metaagents: Simulating interactions of human behaviors for llm-based task-oriented coordination via collaborative generative agents. arXiv preprint arXiv:2310.06500, 2023.