Emergent Mind

Skill Reinforcement Learning and Planning for Open-World Long-Horizon Tasks

(2303.16563)
Published Mar 29, 2023 in cs.LG and cs.AI

Abstract

We study building multi-task agents in open-world environments. Without human demonstrations, learning to accomplish long-horizon tasks in a large open-world environment with reinforcement learning (RL) is extremely inefficient. To tackle this challenge, we convert the multi-task learning problem into learning basic skills and planning over the skills. Using the popular open-world game Minecraft as the testbed, we propose three types of fine-grained basic skills, and use RL with intrinsic rewards to acquire skills. A novel Finding-skill that performs exploration to find diverse items provides better initialization for other skills, improving the sample efficiency for skill learning. In skill planning, we leverage the prior knowledge in LLMs to find the relationships between skills and build a skill graph. When the agent is solving a task, our skill search algorithm walks on the skill graph and generates the proper skill plans for the agent. In experiments, our method accomplishes 40 diverse Minecraft tasks, where many tasks require sequentially executing for more than 10 skills. Our method outperforms baselines by a large margin and is the most sample-efficient demonstration-free RL method to solve Minecraft Tech Tree tasks. The project's website and code can be found at https://sites.google.com/view/plan4mc.

We're not able to analyze this paper right now due to high demand.

Please check back later (sorry!).

Generate a detailed summary of this paper with a premium account.

We ran into a problem analyzing this paper.

Subscribe by Email

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.

References
  1. Scaling Imitation Learning in Minecraft
  2. Video pretraining (vpt): Learning to act by watching unlabeled online videos. Advances in Neural Information Processing Systems (NeurIPS)
  3. Do as i can, not as i say: Grounding language in robotic affordances. In Conference on Robot Learning (CORL)
  4. Open-World Multi-Task Control Through Goal-Aware Representation Learning and Adaptive Horizon Prediction
  5. PaLM: Scaling Language Modeling with Pathways
  6. CLIP4MC: An RL-Friendly Vision-Language Model for Minecraft
  7. Vision-Language Models as Success Detectors
  8. Go-Explore: a New Approach for Hard-Exploration Problems
  9. MineDojo: Building open-ended embodied agents with internet-scale knowledge. In Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track
  10. MineRL: A large-scale dataset of Minecraft demonstrations. Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI)
  11. Towards robust and domain agnostic reinforcement learning competitions: MineRL 2020. In NeurIPS 2020 Competition and Demonstration Track
  12. Mastering Diverse Domains through World Models
  13. Language models as zero-shot planners: Extracting actionable knowledge for embodied agents. In International Conference on Machine Learning (ICML)
  14. Inner Monologue: Embodied Reasoning through Planning with Language Models
  15. The malmo platform for artificial intelligence experimentation. In International Joint Conference on Artificial Intelligence (IJCAI)
  16. Planning and acting in partially observable stochastic domains. Artificial intelligence, 101(1-2):99–134
  17. MineRL diamond 2021 competition: Overview, results, and lessons learned. NeurIPS 2021 Competitions and Demonstrations Track
  18. Multi-task curriculum learning in a complex, visual, hard-exploration domain: Minecraft
  19. Behavior-1k: A benchmark for embodied ai with 1,000 everyday activities and realistic simulation. In Conference on Robot Learning (CORL)
  20. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems (NeurIPS)
  21. Code as Policies: Language Model Programs for Embodied Control
  22. STEVE-1: A Generative Model for Text-to-Behavior in Minecraft
  23. Juewu-mc: Playing Minecraft with sample-efficient hierarchical reinforcement learning. In Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence (IJCAI)
  24. Seihai: A sample-efficient hierarchical ai for the MineRL competition. In Distributed Artificial Intelligence (DAI)
  25. Retrospective analysis of the 2019 MineRL competition on sample efficient reinforcement learning. In NeurIPS 2019 Competition and Demonstration Track
  26. Human-level control through deep reinforcement learning. nature, 518(7540):529–533
  27. Unsupervised Skill-Discovery and Skill-Learning in Minecraft
  28. Do Embodied Agents Dream of Pixelated Sheep: Embodied Decision Making using Language Guided World Modelling
  29. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems (NeurIPS)
  30. Memory gym: Partially observable challenges to memory-based agents. In International Conference on Learning Representations (ICLR)
  31. Mastering atari, go, chess and shogi by planning with a learned model. Nature, 588(7839):604–609
  32. Proximal Policy Optimization Algorithms
  33. Hierarchical deep q-network from imperfect demonstrations in Minecraft. Cognitive Systems Research, 65:74–78
  34. Open-Ended Learning Leads to Generally Capable Agents
  35. A deep hierarchical approach to lifelong learning in Minecraft. In Proceedings of the AAAI conference on artificial intelligence (AAAI)
  36. Voyager: An Open-Ended Embodied Agent with Large Language Models
  37. Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents
  38. Chain of thought prompting elicits reasoning in large language models. In Advances in Neural Information Processing Systems (NeurIPS)
  39. Meta-world: A benchmark and evaluation for multi-task and meta reinforcement learning. In Conference on Robot Learning (CORL)
  40. Ghost in the Minecraft: Generally Capable Agents for Open-World Environments via Large Language Models with Text-based Knowledge and Memory

Show All 40