EnvGen: Generating and Adapting Environments via LLMs for Training Embodied Agents (2403.12014v2)
Abstract: Recent SOTA approaches for embodied learning via interaction directly employ LLMs as agents to determine the next steps in an environment. Due to their world knowledge and reasoning capabilities, LLM agents achieve stronger performance than previous smaller agents based on reinforcement learning (RL); however, frequently calling LLMs is slow and expensive. Instead of directly employing LLMs as agents, can we use LLMs' reasoning capabilities to adaptively create training environments to help smaller RL agents learn useful skills that they are weak at? We propose EnvGen, a novel framework to address this question. We first prompt an LLM to generate training environments by giving it the task description and simulator objectives that the agents should learn and then asking it to generate a set of environment configurations (e.g., different terrains, items initially given to agents, etc.). Next, we train a small RL agent in a mixture of the original and LLM-generated environments. Then, we enable the LLM to continuously adapt the generated environments to progressively improve the skills that the agent is weak at, by providing feedback to the LLM in the form of the agent's performance. We demonstrate the usefulness of EnvGen with comprehensive experiments in Crafter and Heist environments. We find that a small RL agent trained with EnvGen can outperform SOTA methods, including a GPT-4 agent, and learns long-horizon tasks significantly faster. We also show that using an LLM to adapt environments dynamically outperforms curriculum learning approaches and how the environments are adapted to help improve RL agents' weaker skills over time. Additionally, EnvGen is substantially more efficient as it only uses a small number of LLM calls (e.g., 4 in total), whereas LLM agents require thousands of calls. Lastly, we present detailed ablation studies for EnvGen design choices.
- Delf: Designing learning environments with foundation models. In AAAI Workshop, 2024.
- Do As I Can, Not As I Say: Grounding Language in Robotic Affordances. In CoRL, 2022.
- Palm 2 technical report, 2023.
- Playing hard exploration games by watching YouTube. In NeurIPS, 2018. URL http://arxiv.org/abs/1805.11592.
- Layer Normalization. In NIPS 2016 Deep Learning Symposium, 2016. URL http://arxiv.org/abs/1607.06450.
- Unifying count-based exploration and intrinsic motivation. In NIPS, 2016.
- Language Models are Few-Shot Learners. In NeurIPS, 2020. URL http://arxiv.org/abs/2005.14165.
- Exploration by Random Network Distillation. In ICLR, 2018.
- Evaluating large language models trained on code, 2021.
- PaLM: Scaling Language Modeling with Pathways. JMLR, pp. 1–83, 2023. URL http://arxiv.org/abs/2204.02311.
- Leveraging procedural generation to benchmark reinforcement learning. In Hal Daumé III and Aarti Singh (eds.), Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pp. 2048–2056. PMLR, 13–18 Jul 2020. URL https://proceedings.mlr.press/v119/cobbe20a.html.
- PaLM-E: An Embodied Multimodal Language Model. In ICML 2023, 2023. URL http://arxiv.org/abs/2303.03378.
- Guiding Pretraining in Reinforcement Learning with Large Language Models. In ICML, 2023.
- A survey of embodied ai: From simulators to research tasks. IEEE Transactions on Emerging Topics in Computational Intelligence, 6(2):230–244, 2022. doi: 10.1109/TETCI.2022.3141105.
- IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures. In ICML, 2018. ISBN 9781510867963.
- Gemini Team. Gemini: A family of highly capable multimodal models, 2023.
- Generative Adversarial Networks. In NIPS, 2014. ISBN 1406.2661. URL http://arxiv.org/abs/1406.2661.
- Deepseek-coder: When the large language model meets programming – the rise of code intelligence, 2024. URL https://arxiv.org/abs/2401.14196.
- Danijar Hafner. Benchmarking the spectrum of agent capabilities. In ICLR, 2022. URL https://github.com/danijar/crafter.
- Dream to Control: Learning Behaviors by Latent Imagination. In ICLR, 2020.
- Mastering Atari with Discrete World Models. In ICLR, 2021.
- Mastering Diverse Domains through World Models, 2023. URL http://arxiv.org/abs/2301.04104.
- Deep Residual Learning for Image Recognition. In CVPR, 2016.
- Rainbow: Combining improvements in deep reinforcement learning. In AAAI, 2018. ISBN 9781577358008. doi: 10.1609/aaai.v32i1.11796.
- Large Language Models are Zero-Shot Reasoners. In NeurIPS, 2022. URL http://arxiv.org/abs/2205.11916.
- Generating game levels for multiple distinct games with a common latent space. In AIIDE, pp. 109–115, 2020. ISBN 9781577358497. doi: 10.1609/aiide.v16i1.7485.
- SCENECRAFT: Automating Interactive Narrative Scene Generation in Digital Games with Large Language Models. In AIIDE, pp. 86–96, 2023. ISBN 157735883X. doi: 10.1609/aiide.v19i1.27504.
- Reward design with language models. In International Conference on Learning Representations, 2023.
- Exploring long-horizon reasoning with deep RL in combinatorially hard tasks. In Decision Awareness in Reinforcement Learning Workshop at ICML 2022, 2022a. URL https://openreview.net/forum?id=7vPSZASOF0o.
- Auto mc-reward: Automated dense reward design with large language models for minecraft. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024.
- Panogen: Text-conditioned panoramic environment generation for vision-and-language navigation. Advances in Neural Information Processing Systems, 2023.
- Envedit: Environment editing for vision-and-language navigation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022b.
- Deep learning for procedural content generation. Neural Comput. Appl., 33(1):19–37, jan 2021. ISSN 0941-0643. doi: 10.1007/s00521-020-05383-8. URL https://doi.org/10.1007/s00521-020-05383-8.
- Eureka: Human-level reward design via coding large language models. ArXiv, abs/2310.12931, 2023.
- Mojang Studios. Minecraft, 2009. URL https://www.minecraft.net/.
- Discovering Hierarchical Achievements in Reinforcement Learning via Contrastive Learning. In NeurIPS, 2023. URL http://arxiv.org/abs/2307.03486.
- Show Your Work: Scratchpads for Intermediate Computation with Language Models, 2021. URL http://arxiv.org/abs/2112.00114.
- OpenAI. Gpt-4 technical report. ArXiv, 2023a. URL https://api.semanticscholar.org/CorpusID:257532815.
- OpenAI. Chatgpt. https://openai.com/chatgpt, 2023b.
- TOAD-GAN: A Flexible Framework for Few-Shot Level Generation in Token-Based Games. IEEE Transactions on Games, 14(2):284–293, 2022. ISSN 24751510. doi: 10.1109/TG.2021.3069833.
- Proximal Policy Optimization Algorithms, 2017.
- Planning to explore via self-supervisedworld models. In ICML, 2020. ISBN 9781713821120.
- Procedural Content Generation in Games. Springer Publishing Company, Incorporated, 1st edition, 2016. ISBN 3319427148.
- Learning to generalize with object-centric agents in the open world survival game crafter. IEEE Transactions on Games, 2023.
- MarioGPT: Open-Ended Text2Level Generation through Large Language Models. In NeurIPS, 2023. URL http://arxiv.org/abs/2302.05981.
- Reinforcement Learning: An Introduction. The MIT Press, 2 edition, 2018.
- Level Generation Through Large Language Models. In FDG, 2023. ISBN 9781450398565. doi: 10.1145/3582437.3587211.
- Llama: Open and efficient foundation language models, 2023a.
- Llama 2: Open foundation and fine-tuned chat models, 2023b.
- Investigating the Role of Model-Based Learning in Exploration and Transfer. In ICML, 2023.
- Voyager: An Open-Ended Embodied Agent with Large Language Models, 2023a. URL http://arxiv.org/abs/2305.16291.
- ByteSized32: A corpus and challenge task for generating task-specific world models expressed as text games. In Houda Bouamor, Juan Pino, and Kalika Bali (eds.), Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pp. 13455–13471, Singapore, December 2023b. Association for Computational Linguistics. doi: 10.18653/v1/2023.emnlp-main.830. URL https://aclanthology.org/2023.emnlp-main.830.
- Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents. In NeurIPS, 2023c. URL http://arxiv.org/abs/2302.01560.
- JARVIS-1: Open-World Multi-task Agents with Memory-Augmented Multimodal Language Models, 2023d. URL http://arxiv.org/abs/2311.05997.
- Scaling data generation in vision-and-language navigation. In ICCV, 2023e.
- Christopher J.C.H. Watkins. Learning from Delayed Rewards. PhD thesis, University of Cambridge, England, May 1989.
- Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. In NeurIPS, pp. 1–43, 2022. URL http://arxiv.org/abs/2201.11903.
- Lilian Weng. Exploration strategies in deep reinforcement learning. lilianweng.github.io, Jun 2020. URL https://lilianweng.github.io/posts/2020-06-07-exploration-drl/.
- SPRING: Studying the Paper and Reasoning to Play Games. In NeurIPS, 2023. URL http://arxiv.org/abs/2305.15486.
- ReAct: Synergizing Reasoning and Acting in Language Models. In ICLR, 2023. URL http://arxiv.org/abs/2210.03629.
- Skill Reinforcement Learning and Planning for Open-World Long-Horizon Tasks. In Foundation Models for Decision Making Workshop at NeurIPS, 2023.
- See and Think: Embodied Agent in Virtual Environment, 2023. URL http://arxiv.org/abs/2311.15209.