Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 137 tok/s
Gemini 2.5 Pro 48 tok/s Pro
GPT-5 Medium 29 tok/s Pro
GPT-5 High 31 tok/s Pro
GPT-4o 90 tok/s Pro
Kimi K2 207 tok/s Pro
GPT OSS 120B 425 tok/s Pro
Claude Sonnet 4.5 36 tok/s Pro
2000 character limit reached

Mutual Enhancement of Large Language and Reinforcement Learning Models through Bi-Directional Feedback Mechanisms: A Planning Case Study (2401.06603v2)

Published 12 Jan 2024 in cs.CL

Abstract: LLMs have demonstrated remarkable capabilities for reinforcement learning (RL) models, such as planning and reasoning capabilities. However, the problems of LLMs and RL model collaboration still need to be solved. In this study, we employ a teacher-student learning framework to tackle these problems, specifically by offering feedback for LLMs using RL models and providing high-level information for RL models with LLMs in a cooperative multi-agent setting. Within this framework, the LLM acts as a teacher, while the RL model acts as a student. The two agents cooperatively assist each other through a process of recursive help, such as "I help you help I help." The LLM agent supplies abstract information to the RL agent, enabling efficient exploration and policy improvement. In turn, the RL agent offers feedback to the LLM agent, providing valuable, real-time information that helps generate more useful tokens. This bi-directional feedback loop promotes optimization, exploration, and mutual improvement for both agents, enabling them to accomplish increasingly challenging tasks. Remarkably, we propose a practical algorithm to address the problem and conduct empirical experiments to evaluate the effectiveness of our method.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (17)
  1. Superhuman ai for multiplayer poker. Science, 365(6456):885–890, 2019.
  2. Grounding large language models in interactive environments with online reinforcement learning. In ICML, volume 202, pages 3676–3713. PMLR, 23–29 Jul 2023.
  3. A survey on evaluation of large language models. arXiv:2307.03109, 2023.
  4. Babyai: First steps towards grounded language learning with a human in the loop. ICLR, 2019.
  5. Scaling instruction-finetuned language models. arXiv:2210.11416, 2022.
  6. Guiding pretraining in reinforcement learning with large language models. In ICML, volume 202, pages 8657–8677. PMLR, 23–29 Jul 2023.
  7. A human-centered safe robot reinforcement learning framework with interactive behaviors. Frontiers in Neurorobotics, 17, 2023.
  8. A review of safe reinforcement learning: Methods, theory and applications. arXiv:2205.10330, 2022.
  9. OpenAI. Gpt-4 technical report, 2023.
  10. Training language models to follow instructions with human feedback. NeurIPS, 35:27730–27744, 2022.
  11. Mastering the game of go with deep neural networks and tree search. nature, 529(7587):484–489, 2016.
  12. Progprompt: Generating situated robot task plans using large language models. In ICRA, pages 11523–11530. IEEE, 2023.
  13. Reinforcement learning: An introduction. MIT press, 2018.
  14. Large language models as generalizable policies for embodied tasks. arXiv:2310.17722, 2023.
  15. Large language models in medicine. Nature medicine, 29(8):1930–1940, 2023.
  16. Exploring bi-directional context for improved chatbot response generation using deep reinforcement learning. Applied Sciences, 13(8):5041, 2023.
  17. Prompting large language model for machine translation: A case study. In ICML, volume 202, pages 41092–41110. PMLR, 23–29 Jul 2023.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (1)

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 1 tweet and received 0 likes.

Upgrade to Pro to view all of the tweets about this paper: