Emergent Mind

Voyager: An Open-Ended Embodied Agent with Large Language Models

(2305.16291)
Published May 25, 2023 in cs.AI and cs.LG

Abstract

We introduce Voyager, the first LLM-powered embodied lifelong learning agent in Minecraft that continuously explores the world, acquires diverse skills, and makes novel discoveries without human intervention. Voyager consists of three key components: 1) an automatic curriculum that maximizes exploration, 2) an ever-growing skill library of executable code for storing and retrieving complex behaviors, and 3) a new iterative prompting mechanism that incorporates environment feedback, execution errors, and self-verification for program improvement. Voyager interacts with GPT-4 via blackbox queries, which bypasses the need for model parameter fine-tuning. The skills developed by Voyager are temporally extended, interpretable, and compositional, which compounds the agent's abilities rapidly and alleviates catastrophic forgetting. Empirically, Voyager shows strong in-context lifelong learning capability and exhibits exceptional proficiency in playing Minecraft. It obtains 3.3x more unique items, travels 2.3x longer distances, and unlocks key tech tree milestones up to 15.3x faster than prior SOTA. Voyager is able to utilize the learned skill library in a new Minecraft world to solve novel tasks from scratch, while other techniques struggle to generalize. We open-source our full codebase and prompts at https://voyager.minedojo.org/.

We're not able to analyze this paper right now due to high demand.

Please check back later (sorry!).

Generate a detailed summary of this paper with a premium account.

We ran into a problem analyzing this paper.

Subscribe by Email

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.

YouTube
References
  1. AI2-THOR: An Interactive 3D Environment for Visual AI
  2. Habitat: A platform for embodied AI research. In 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27 - November 2, 2019, pages 9338–9346. IEEE
  3. robosuite: A Modular Simulation Framework and Benchmark for Robot Learning
  4. Interactive Gibson Benchmark (iGibson 0.5): A Benchmark for Interactive Navigation in Cluttered Environments
  5. iGibson 1.0: a Simulation Environment for Interactive Tasks in Large Realistic Scenes
  6. Reinforcement learning in robotics: A survey. The International Journal of Robotics Research, 32(11):1238–1274
  7. Deep reinforcement learning: A brief survey. IEEE Signal Processing Magazine, 34(6):26–38
  8. Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos
  9. Creating Multimodal Interactive Agents with Imitation and Self-Supervised Learning
  10. Alphastar: Mastering the real-time strategy game starcraft ii. DeepMind blog, 2
  11. Go-Explore: a New Approach for Hard-Exploration Problems
  12. Evolving multimodal robot behavior via many stepping stones with the combinatorial multiobjective evolutionary algorithm. Evolutionary computation, 30(2):131–164
  13. Enhanced POET: open-ended reinforcement learning through unbounded invention of learning challenges and their solutions. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event, volume 119 of Proceedings of Machine Learning Research, pages 9940–9951. PMLR
  14. Multi-task curriculum learning in a complex, visual, hard-exploration domain: Minecraft
  15. Emergent complexity and zero-shot transfer via unsupervised environment design. In Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin, editors, Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual
  16. Code as Policies: Language Model Programs for Embodied Control
  17. Program guided agent. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net
  18. Proto: Program-guided transformer for program-guided tasks. In Marc’Aurelio Ranzato, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, and Jennifer Wortman Vaughan, editors, Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, pages 17021–17036
  19. Vima: General robot manipulation with multimodal prompts. ARXIV.ORG
  20. CLIPort: What and Where Pathways for Robotic Manipulation
  21. SECANT: self-expert cloning for zero-shot generalization of visual policies. In Marina Meila and Tong Zhang, editors, Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event, volume 139 of Proceedings of Machine Learning Research, pages 3088–3099. PMLR
  22. ProgPrompt: Generating Situated Robot Task Plans using Large Language Models
  23. MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge
  24. Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language
  25. Do As I Can, Not As I Say: Grounding Language in Robotic Affordances
  26. Inner Monologue: Embodied Reasoning through Planning with Language Models
  27. Language models as zero-shot planners: Extracting actionable knowledge for embodied agents. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvári, Gang Niu, and Sivan Sabato, editors, International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA, volume 162 of Proceedings of Machine Learning Research, pages 9118–9147. PMLR
  28. Significant-gravitas/auto-gpt: An experimental open-source attempt to make gpt-4 fully autonomous
  29. ReAct: Synergizing Reasoning and Acting in Language Models
  30. Reflexion: Language Agents with Verbal Reinforcement Learning
  31. Continual lifelong learning with neural networks: A review. Neural Networks, 113:54–71
  32. A Comprehensive Survey of Continual Learning: Theory, Method and Application
  33. Playing Atari with Deep Reinforcement Learning
  34. Dota 2 with Large Scale Deep Reinforcement Learning
  35. GPT-4 Technical Report
  36. Emergent Abilities of Large Language Models
  37. Language models are few-shot learners. In Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin, editors, Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual
  38. Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res., 21:140:1–140:67
  39. Diversity is all you need: Learning skills without a reward function. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net
  40. Improving exploration in evolution strategies for deep reinforcement learning via a population of novelty-seeking agents. In Samy Bengio, Hanna M. Wallach, Hugo Larochelle, Kristen Grauman, Nicolò Cesa-Bianchi, and Roman Garnett, editors, Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montréal, Canada, pages 5032–5043
  41. Evaluating Large Language Models Trained on Code
  42. Paired Open-Ended Trailblazer (POET): Endlessly Generating Increasingly Complex and Diverse Learning Environments and Their Solutions
  43. Automatic curriculum learning for deep RL: A short survey. In Christian Bessiere, editor, Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI 2020, pages 4819–4825. ijcai.org
  44. Intrinsically motivated goal exploration processes with automatic curriculum learning. The Journal of Machine Learning Research, 23(1):6818–6858
  45. DreamCoder: Growing generalizable, interpretable knowledge with wake-sleep Bayesian program learning
  46. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
  47. Asynchronous methods for deep reinforcement learning. In Maria-Florina Balcan and Kilian Q. Weinberger, editors, Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York City, NY, USA, June 19-24, 2016, volume 48 of JMLR Workshop and Conference Proceedings, pages 1928–1937. JMLR.org
  48. Proximal Policy Optimization Algorithms
  49. Continuous control with deep reinforcement learning. In Yoshua Bengio and Yann LeCun, editors, 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings
  50. Introducing chatgpt
  51. New and improved embedding model
  52. PrismarineJS. Prismarinejs/mineflayer: Create minecraft bots with a powerful, stable, and high level javascript api.
  53. Do embodied agents dream of pixelated sheep?: Embodied decision making using language guided world modelling. ARXIV.ORG
  54. Open-World Multi-Task Control Through Goal-Aware Representation Learning and Adaptive Horizon Prediction
  55. Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents
  56. Sparks of Artificial General Intelligence: Early experiments with GPT-4
  57. Summary of ChatGPT-Related Research and Perspective Towards the Future of Large Language Models
  58. Prismer: A Vision-Language Model with Multi-Task Experts
  59. PaLM-E: An Embodied Multimodal Language Model
  60. LLaMA: Open and Efficient Foundation Language Models
  61. Minerl: A large-scale dataset of minecraft demonstrations. In Sarit Kraus, editor, Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI 2019, Macao, China, August 10-16, 2019, pages 2442–2448. ijcai.org
  62. The MineRL 2019 Competition on Sample Efficient Reinforcement Learning using Human Priors
  63. The MineRL 2020 Competition on Sample Efficient Reinforcement Learning using Human Priors
  64. MineRL Diamond 2021 Competition: Overview, Results, and Lessons Learned
  65. The malmo platform for artificial intelligence experimentation. In Subbarao Kambhampati, editor, Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI 2016, New York, NY, USA, 9-15 July 2016, pages 4246–4247. IJCAI/AAAI Press
  66. JueWu-MC: Playing Minecraft with Sample-efficient Hierarchical Reinforcement Learning
  67. SEIHAI: A Sample-efficient Hierarchical AI for the MineRL Competition
  68. Hierarchical deep q-network from imperfect demonstrations in minecraft. Cogn. Syst. Res., 65:74–78
  69. Mastering Diverse Domains through World Models
  70. Craft an iron sword: Dynamically generating interactive game characters by prompting large language models tuned on code. In Proceedings of the 3rd Wordplay: When Language Meets Games Workshop (Wordplay 2022), pages 25–43, Seattle, United States, 2022. Association for Computational Linguistics.
  71. Skill Reinforcement Learning and Planning for Open-World Long-Horizon Tasks
  72. On the Opportunities and Risks of Foundation Models
  73. PaLM: Scaling Language Modeling with Pathways
  74. Scaling Instruction-Finetuned Language Models
  75. A survey of embodied AI: from simulators to research tasks. IEEE Trans. Emerg. Top. Comput. Intell., 6(2):230–244
  76. Rearrangement: A Challenge for Embodied AI
  77. Recent advances in robot learning from demonstration. Annual review of control, robotics, and autonomous systems, 3:297–330
  78. A review of physics simulators for robotic applications. IEEE Access, 9:51416–51431
  79. Film: Following instructions in language with modular methods. International Conference on Learning Representations
  80. A persistent spatial semantic representation for high-level natural language instruction execution. In 5th Annual Conference on Robot Learning
  81. DERA: Enhancing Large Language Model Completions with Dialog-Enabled Resolving Agents
  82. Generative Agents: Interactive Simulacra of Human Behavior
  83. SPRING: Studying the Paper and Reasoning to Play Games
  84. CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis
  85. CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning
  86. Execution-guided neural program synthesis. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net
  87. Latent Execution for Neural Program Synthesis
  88. Write, execute, assess: Program synthesis with a REPL. In Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d’Alché-Buc, Emily B. Fox, and Roman Garnett, editors, Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pages 9165–9174
  89. Competition-Level Code Generation with AlphaCode
  90. Training Verifiers to Solve Math Word Problems
  91. LEVER: Learning to Verify Language-to-Code Generation with Execution
  92. Errors are Useful Prompts: Instruction Guided Task Programming with Verifier-Assisted Iterative Prompting

Show All 92