Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
11 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

RL-GPT: Integrating Reinforcement Learning and Code-as-policy (2402.19299v1)

Published 29 Feb 2024 in cs.AI and cs.LG

Abstract: LLMs have demonstrated proficiency in utilizing various tools by coding, yet they face limitations in handling intricate logic and precise control. In embodied tasks, high-level planning is amenable to direct coding, while low-level actions often necessitate task-specific refinement, such as Reinforcement Learning (RL). To seamlessly integrate both modalities, we introduce a two-level hierarchical framework, RL-GPT, comprising a slow agent and a fast agent. The slow agent analyzes actions suitable for coding, while the fast agent executes coding tasks. This decomposition effectively focuses each agent on specific tasks, proving highly efficient within our pipeline. Our approach outperforms traditional RL methods and existing GPT agents, demonstrating superior efficiency. In the Minecraft game, it rapidly obtains diamonds within a single day on an RTX3090. Additionally, it achieves SOTA performance across all designated MineDojo tasks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (55)
  1. Gpt-4 technical report. arXiv preprint arXiv:2303.08774, 2023.
  2. Do as i can, not as i say: Grounding language in robotic affordances. arXiv preprint arXiv:2204.01691, 2022.
  3. Video pretraining (vpt): Learning to act by watching unlabeled online videos. Advances in Neural Information Processing Systems, pages 24639–24654, 2022.
  4. Grounding large language models in interactive environments with online reinforcement learning. arXiv preprint arXiv:2302.02662, 2023.
  5. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374, 2021.
  6. Palm-e: An embodied multimodal language model. arXiv preprint arXiv:2303.03378, 2023.
  7. Minedojo: Building open-ended embodied agents with internet-scale knowledge. Advances in Neural Information Processing Systems, pages 18343–18362, 2022.
  8. Llama rider: Spurring large language models to explore the open world. arXiv preprint arXiv:2310.08922, 2023.
  9. Llm as os (llmao), agents as apps: Envisioning aios, agents and the aios-agent ecosystem. arXiv preprint arXiv:2312.03815, 2023.
  10. S Gravitas. Auto-gpt: An experimental open-source attempt to make gpt-4 fully autonomous. github. retrieved april 17, 2023, 2023.
  11. A real-world webagent with planning, long context understanding, and program synthesis. arXiv preprint arXiv:2307.12856, 2023.
  12. Minerl: A large-scale dataset of minecraft demonstrations. arXiv preprint arXiv:1907.13440, 2019.
  13. Mastering diverse domains through world models. arXiv preprint arXiv:2301.04104, 2023.
  14. Metagpt: Meta programming for multi-agent collaborative framework. arXiv preprint arXiv:2308.00352, 2023a.
  15. Cogagent: A visual language model for gui agents. arXiv preprint arXiv:2312.08914, 2023b.
  16. Language models as zero-shot planners: Extracting actionable knowledge for embodied agents. In International Conference on Machine Learning, pages 9118–9147, 2022a.
  17. Inner monologue: Embodied reasoning through planning with language models. arXiv preprint arXiv:2207.05608, 2022b.
  18. Vima: General robot manipulation with multimodal prompts. arXiv, 2022.
  19. Minerl diamond 2021 competition: Overview, results, and lessons learned. NeurIPS 2021 Competitions and Demonstrations Track, pages 13–28, 2022.
  20. Auto mc-reward: Automated dense reward design with large language models for minecraft. arXiv preprint arXiv:2312.09238, 2023.
  21. Code as policies: Language model programs for embodied control. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pages 9493–9500, 2023a.
  22. Encouraging divergent thinking in large language models through multi-agent debate. arXiv preprint arXiv:2305.19118, 2023b.
  23. Steve-1: A generative model for text-to-behavior in minecraft. arXiv preprint arXiv:2306.00937, 2023.
  24. Chameleon: Plug-and-play compositional reasoning with large language models. arXiv preprint arXiv:2304.09842, 2023.
  25. Eureka: Human-level reward design via coding large language models. arXiv preprint arXiv:2310.12931, 2023.
  26. Dera: enhancing large language model completions with dialog-enabled resolving agents. arXiv preprint arXiv:2303.17071, 2023.
  27. Do embodied agents dream of pixelated sheep?: Embodied decision making using language guided world modelling. arXiv preprint arXiv:2301.12050, 2023.
  28. Generative agents: Interactive simulacra of human behavior. arXiv preprint arXiv:2304.03442, 2023.
  29. Accelerating reinforcement learning with learned skill priors. In Conference on robot learning, pages 188–204. PMLR, 2021.
  30. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
  31. Building persona consistent dialogue agents with offline reinforcement learning. arXiv preprint arXiv:2310.10735, 2023.
  32. Progprompt: Generating situated robot task plans using large language models. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pages 11523–11530, 2023.
  33. Hao Sun. Offline prompt evaluation and optimization with inverse reinforcement learning. arXiv preprint arXiv:2309.06553, 2023.
  34. Query-dependent prompt evaluation and optimization with offline inverse rl. arXiv e-prints, 2023.
  35. Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning. Artificial intelligence, 112(1-2):181–211, 1999.
  36. XAgent Team. Xagent: An autonomous agent for complex task solving, 2023.
  37. Craft an iron sword: Dynamically generating interactive game characters by prompting large language models tuned on code. In Proceedings of the 3rd Wordplay: When Language Meets Games Workshop (Wordplay 2022), pages 25–43, 2022.
  38. Voyager: An open-ended embodied agent with large language models. arXiv preprint arXiv:2305.16291, 2023a.
  39. Tpe: Towards better compositional reasoning over conceptual tools with multi-persona collaboration. arXiv preprint arXiv:2309.16090, 2023b.
  40. A survey on large language model based autonomous agents. arXiv preprint arXiv:2308.11432, 2023c.
  41. Jarvis-1: Open-world multi-task agents with memory-augmented multimodal language models. arXiv preprint arXiv:2311.05997, 2023d.
  42. Describe, explain, plan and select: Interactive planning with large language models enables open-world multi-task agents. arXiv preprint arXiv:2302.01560, 2023e.
  43. Text2reward: Automated dense reward function generation for reinforcement learning. arXiv preprint arXiv:2309.11489, 2023.
  44. Large language models as optimizers. arXiv preprint arXiv:2309.03409, 2023a.
  45. Octopus: Embodied vision-language programmer from environmental feedback. arXiv preprint arXiv:2310.08588, 2023b.
  46. Appagent: Multimodal agents as smartphone users. arXiv preprint arXiv:2312.13771, 2023c.
  47. React: Synergizing reasoning and acting in language models. arXiv preprint arXiv:2210.03629, 2022.
  48. Language to rewards for robotic skill synthesis. arXiv preprint arXiv:2306.08647, 2023.
  49. Plan4mc: Skill reinforcement learning and planning for open-world minecraft tasks. arXiv preprint arXiv:2303.16563, 2023.
  50. Pre-training goal-based models for sample-efficient reinforcement learning. In The Twelfth International Conference on Learning Representations, 2024.
  51. Creative agents: Empowering agents with imagination for creative tasks. arXiv preprint arXiv:2312.02519, 2023a.
  52. Large language model is semi-parametric reinforcement learning agent. arXiv preprint arXiv:2306.07929, 2023b.
  53. Rladapter: Bridging large language models to reinforcement learning in open worlds. arXiv preprint arXiv:2309.17176, 2023.
  54. Steve-eye: Equipping llm-based embodied agents with visual perception in open worlds. arXiv preprint arXiv:2310.13255, 2023.
  55. Ghost in the minecraft: Generally capable agents for open-world enviroments via large language models with text-based knowledge and memory. arXiv preprint arXiv:2305.17144, 2023.
Citations (7)

Summary

We haven't generated a summary for this paper yet.