An In-depth Survey of Large Language Model-based Artificial Intelligence Agents (2309.14365v1)
Abstract: Due to the powerful capabilities demonstrated by LLM, there has been a recent surge in efforts to integrate them with AI agents to enhance their performance. In this paper, we have explored the core differences and characteristics between LLM-based AI agents and traditional AI agents. Specifically, we first compare the fundamental characteristics of these two types of agents, clarifying the significant advantages of LLM-based agents in handling natural language, knowledge storage, and reasoning capabilities. Subsequently, we conducted an in-depth analysis of the key components of AI agents, including planning, memory, and tool use. Particularly, for the crucial component of memory, this paper introduced an innovative classification scheme, not only departing from traditional classification methods but also providing a fresh perspective on the design of an AI agent's memory system. We firmly believe that in-depth research and understanding of these core components will lay a solid foundation for the future advancement of AI agent technology. At the end of the paper, we provide directional suggestions for further research in this field, with the hope of offering valuable insights to scholars and researchers in the field.
- Pddl— the planning domain definition language. Technical Report, Tech. Rep.
- RL4F: Generating natural language feedback with reinforcement learning for repairing model outputs. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, pages 7716–7733.
- Meta reinforcement learning for sim-to-real domain adaptation. In 2020 IEEE International Conference on Robotics and Automation (ICRA), pages 2725–2731. IEEE.
- Deep reinforcement learning: A brief survey. IEEE Signal Processing Magazine, 34(6):26–38.
- Alan D Baddeley. 1997. Human memory: Theory and practice. psychology press.
- Alan David Baddeley. 1983. Working memory. Philosophical Transactions of the Royal Society of London. B, Biological Sciences, 302(1110):311–324.
- Benchmarking llm powered chatbots: Methods and metrics. arXiv preprint arXiv:2308.04624.
- Christopher Berner and Brockman et al. 2019. Dota 2 with large scale deep reinforcement learning. arXiv preprint arXiv:1912.06680.
- Chatgpt is a knowledgeable but inexperienced solver: An investigation of commonsense problem in large language models. arXiv preprint arXiv:2303.16421.
- Emergent autonomous scientific research capabilities of large language models. arXiv preprint arXiv:2304.05332.
- Chemcrow: Augmenting large-language models with chemistry tools. arXiv preprint arXiv:2304.05376.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
- Large language models as tool makers. arXiv preprint arXiv:2305.17126.
- Eduardo Camina and Francisco Güell. 2017. The neuroanatomical, neurophysiological and psychological basis of memory: Current models and their origins. Frontiers in pharmacology, 8:438.
- Optimal mixed discrete-continuous planning for linear hybrid systems. In Proceedings of the 24th International Conference on Hybrid Systems: Computation and Control, pages 1–12.
- When do you need chain-of-thought prompting for chatgpt? arXiv preprint arXiv:2304.03262.
- Introspective tips: Large language model for in-context decision making. arXiv preprint arXiv:2305.11598.
- Po-Lin Chen and Cheng-Shang Chang. 2023. Interact: Exploring the potentials of chatgpt as a cooperative agent. arXiv preprint arXiv:2308.01552.
- Chatcot: Tool-augmented chain-of-thought reasoning on\\\backslash\\\\backslash\chat-based large language models. arXiv preprint arXiv:2305.14323.
- Pretrained language model embryology: The birth of albert. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 6813–6828.
- Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168.
- Analyzing commonsense emergence in few-shot knowledge models. arXiv preprint arXiv:2101.00297.
- Commonsense knowledge mining from pretrained models. In Proceedings of the conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing, pages 1173–1178.
- Clip-nav: Using clip for zero-shot vision-and-language navigation. arXiv preprint arXiv:2211.16649.
- Palm-e: An embodied multimodal language model. In Proceedings of the International Conference on Machine Learning, pages 8469–8488.
- Htn planning: complexity and expressivity. In Proceedings of the Twelfth AAAI National Conference on Artificial Intelligence, pages 1123–1128.
- Minedojo: Building open-ended embodied agents with internet-scale knowledge. Advances in Neural Information Processing Systems, 35:18343–18362.
- Maria Fox and Derek Long. 2003. Pddl2. 1: An extension to pddl for expressing temporal planning domains. Journal of artificial intelligence research, 20:61–124.
- Working memory capacity of chatgpt: An empirical study.
- Leveraging pre-trained large language models to construct and utilize world models for model-based task planning. arXiv preprint arXiv:2305.14909.
- Recent trends in task and motion planning for robotics: A survey. ACM Computing Surveys.
- A universal modular actor formalism for artificial intelligence. In Proceedings of the 3rd international joint conference on Artificial intelligence, pages 235–245.
- Enabling efficient interaction between an algorithm agent and an llm: A reinforcement learning approach. arXiv preprint arXiv:2306.03604.
- Jie Huang and Kevin Chen-Chuan Chang. 2022. Towards reasoning in large language models: A survey. arXiv preprint arXiv:2212.10403.
- Language models as zero-shot planners: Extracting actionable knowledge for embodied agents. In International Conference on Machine Learning, pages 9118–9147.
- Inner monologue: Embodied reasoning through planning with language models. arXiv preprint arXiv:2207.05608.
- Ian ML Hunter. 1957. Memory: Facts and fallacies.
- Task planning in robotics: an empirical comparison of pddl-and asp-based systems. Frontiers of Information Technology & Electronic Engineering, 20:363–373.
- Bioinspired electronics for artificial sensory systems. Advanced Materials, 31(34):1803637.
- Think before you act: Decision transformers with internal working memory. arXiv preprint arXiv:2305.16338.
- Mrkl systems: A modular, neuro-symbolic architecture that combines large language models, external knowledge sources and discrete reasoning. arXiv preprint arXiv:2205.00445.
- An emotion understanding framework for intelligent agents based on episodic and semantic memories. Autonomous agents and multi-agent systems, 28:126–153.
- Simple but effective: Clip embeddings for embodied ai. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14829–14838.
- A machine with short-term, episodic, and semantic memory systems. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 48–56.
- Large language models are zero-shot reasoners. Advances in neural information processing systems, 35:22199–22213.
- Prompted llms as chatbot modules for long open-domain conversation. arXiv preprint arXiv:2305.04533.
- Llava-med: Training a large language-and-vision assistant for biomedicine in one day. arXiv preprint arXiv:2306.00890.
- Large language models with controllable working memory. In Findings of the Association for Computational Linguistics: ACL, pages 1774–1793.
- Haizhen Li and Xilun Ding. 2023. Adaptive and intelligent robot task planning for home service: A review. Engineering Applications of Artificial Intelligence, 117:105618.
- Api-bank: A benchmark for tool-augmented llms. arXiv preprint arXiv:2304.08244.
- Yuxi Li. 2017. Deep reinforcement learning: An overview. arXiv preprint arXiv:1701.07274.
- Unleashing infinite-length input capacity for large-scale language models with self-controlled memory system. arXiv preprint arXiv:2304.13343.
- Taskmatrix. ai: Completing tasks by connecting foundation models with millions of apis. arXiv preprint arXiv:2303.16434.
- Decision-oriented dialogue for human-ai collaboration. arXiv preprint arXiv:2305.20076.
- Agentsims: An open-source sandbox for large language model evaluation. arXiv preprint arXiv:2308.04026.
- Llm+ p: Empowering large language models with optimal planning proficiency. arXiv preprint arXiv:2304.11477.
- Chain of hindsight aligns language models with feedback. arXiv preprint arXiv:2302.02676, 3.
- Agentbench: Evaluating llms as agents. arXiv preprint arXiv:2308.03688.
- Bolaa: Benchmarking and orchestrating llm-augmented autonomous agents. arXiv preprint arXiv:2308.05960.
- Petlon: planning efficiently for task-level-optimal navigation. In Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, pages 220–228.
- Few-shot subgoal planning with language models. arXiv preprint arXiv:2205.14288.
- Jieyi Long. 2023. Large language model guided tree-of-thought. arXiv preprint arXiv:2305.08291.
- Chameleon: Plug-and-play compositional reasoning with large language models. arXiv preprint arXiv:2304.09842.
- Video-chatgpt: Towards detailed video understanding via large vision and language models. arXiv preprint arXiv:2306.05424.
- Zson: Zero-shot object-goal navigation using multimodal goal embeddings. Advances in Neural Information Processing Systems, pages 32340–32352.
- J McCarthy. 1959. Programs with common sense. In Proc. Teddington Conference on the Mechanization of Thought Processes, 1959, pages 75–91.
- Marvin L. Minsky. 1988. The Society of Mind. Simon & Schuster, New York.
- Playing atari with deep reinforcement learning. CoRR, abs/1312.5602.
- Webgpt: Browser-assisted question-answering with human feedback. arXiv preprint arXiv:2112.09332.
- Andrew M Nuxoll and John E Laird. 2007. Extending cognitive architecture with episodic memory. In Proceedings of the 22nd national conference on Artificial intelligence-Volume 2, pages 1560–1565.
- Amin Omidvar and Aijun An. 2023. Empowering conversational agents using semantic in-context learning. In Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2023), pages 766–771.
- OpenAI. 2023. Gpt-4 technical report.
- Art: Automatic multi-step reasoning and tool-use for large language models. arXiv preprint arXiv:2303.09014.
- Talm: Tool augmented language models. arXiv preprint arXiv:2205.12255.
- Generative agents: Interactive simulacra of human behavior. arXiv preprint arXiv:2304.03442.
- Gorilla: Large language model connected with massive apis. arXiv preprint arXiv:2305.15334.
- Check your facts and try again: Improving large language models with external knowledge and automated feedback. arXiv preprint arXiv:2302.12813.
- Language models as knowledge bases? In Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, pages 2463–2473.
- Learning transferable visual models from natural language supervision. In Proceedings of the 38th International Conference on Machine Learning, pages 8748–8763.
- A primer in bertology: What we know about how bert works. Transactions of the Association for Computational Linguistics, 8:842–866.
- Learning representations by back-propagating errors. nature, 323(6088):533–536.
- Stuart Russell and Peter Norvig. 2010. Artificial Intelligence: A Modern Approach, 3 edition. Prentice Hall.
- Tara Safavi and Danai Koutra. 2021. Relational world knowledge representation in contextual language models: A review. arXiv preprint arXiv:2104.05837.
- Toolformer: Language models can teach themselves to use tools. arXiv preprint arXiv:2302.04761.
- Hugginggpt: Solving ai tasks with chatgpt and its friends in huggingface. arXiv preprint arXiv:2303.17580.
- Reflexion: an autonomous agent with dynamic memory and self-reflection. arXiv preprint arXiv:2303.11366.
- Blenderbot 3: a deployed conversational agent that continually learns to responsibly engage. arXiv preprint arXiv:2208.03188.
- Mastering the game of go with deep neural networks and tree search. nature, 529(7587):484–489.
- Progprompt: Generating situated robot task plans using large language models. In Proceedings of IEEE International Conference on Robotics and Automation, pages 11523–11530.
- Interleaving hierarchical task planning and motion constraint testing for dual-arm manipulation. In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 4061–4066.
- Adaplanner: Adaptive planning from feedback with language models. arXiv preprint arXiv:2305.16653.
- Graspgpt: Leveraging semantic knowledge from a large language model for task-oriented grasping. arXiv preprint arXiv:2307.13204.
- Lamda: Language models for dialog applications. arXiv preprint arXiv:2201.08239.
- Endel Tulving. 1983. Elements of episodic memory.
- Endel Tulving et al. 1972. Episodic and semantic memory. Organization of memory, 1(381-403):1.
- On the planning abilities of large language models (a critical investigation with a proposed benchmark). arXiv preprint arXiv:2302.06706.
- Steven Vere and Timothy Bickmore. 1990. A basic agent. Computational intelligence, 6(1):41–60.
- Grandmaster level in starcraft ii using multi-agent reinforcement learning. Nature, 575(7782):350–354.
- Artificial sensory memory. Advanced Materials, 32(15):1902434.
- Voyager: An open-ended embodied agent with large language models. arXiv preprint arXiv:2305.16291.
- Plan-and-solve prompting: Improving zero-shot chain-of-thought reasoning by large language models. In Proceedings of the 61st Annual Meeting of the Association for Computational, pages 2609–2634.
- Emergent abilities of large language models. arXiv preprint arXiv:2206.07682.
- Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837.
- Translating natural language to planning goals with large-language models. arXiv preprint arXiv:2302.05128.
- Gentopia: A collaborative platform for tool-augmented llms. arXiv preprint arXiv:2308.04030.
- Tree of thoughts: Deliberate problem solving with large language models. arXiv preprint arXiv:2305.10601.
- React: Synergizing reasoning and acting in language models. arXiv preprint arXiv:2210.03629.
- Retroformer: Retrospective large language agents with policy gradient optimization. arXiv preprint arXiv:2308.02151.
- Investigating chain-of-thought with chatgpt for stance detection on social media. arXiv preprint arXiv:2304.03087.
- Large language model is semi-parametric reinforcement learning agent. arXiv preprint arXiv:2306.07929.
- Automatic chain of thought prompting in large language models. In Proceedings of the Eleventh International Conference on Learning Representations.
- Memorybank: Enhancing large language models with long-term memory. arXiv preprint arXiv:2305.10250.
- Mingchen Zhuge and Haozhe Liu et al. 2023. Mindstorms in natural language-based societies of mind.