Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 47 tok/s
Gemini 2.5 Pro 44 tok/s Pro
GPT-5 Medium 13 tok/s Pro
GPT-5 High 12 tok/s Pro
GPT-4o 64 tok/s Pro
Kimi K2 160 tok/s Pro
GPT OSS 120B 452 tok/s Pro
Claude Sonnet 4 37 tok/s Pro
2000 character limit reached

AdaRefiner: Refining Decisions of Language Models with Adaptive Feedback (2309.17176v3)

Published 29 Sep 2023 in cs.AI and cs.CL

Abstract: LLMs have demonstrated significant success across various domains. However, their application in complex decision-making tasks frequently necessitates intricate prompt engineering or fine-tuning, leading to challenges in unseen downstream tasks and heavy demands on computational resources. Meanwhile, Reinforcement Learning (RL) has been recognized as effective in decision-making problems but struggles in environments with sparse rewards, such as open-world games. To overcome these challenges, we introduce AdaRefiner, a novel framework designed to enhance the synergy between LLMs and RL feedback. The key component of AdaRefiner is a lightweight Adapter LLM (LM), which automatically refines task comprehension based on feedback from RL agents. This method mitigates the need for intricate prompt engineering and intensive LLM fine-tuning while maintaining the LLMs' generalization abilities and enhancing their decision-making capabilities in downstream tasks. Empirical evaluations of AdaRefiner on 22 diverse tasks within the open-world game Crafter have demonstrated its superior effectiveness, especially in guiding agents towards higher-level and common-sense skills. Our work makes contributions to the automatic self-refinement of LLMs with RL feedback, offering a more adaptable and efficient solution for complex decision-making problems.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (48)
  1. Do as I can, not as I say: Grounding language in robotic affordances. In CoRL.
  2. On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258.
  3. Exploration by random network distillation. In Seventh International Conference on Learning Representations, pages 1–17.
  4. Actrce: Augmenting experience via teacher’s advice for multi-goal reinforcement learning. arXiv preprint arXiv:1902.04546.
  5. LMPriors: Pre-trained language models as task-specific priors. arXiv preprint arXiv:2210.12530.
  6. Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311.
  7. Qlora: Efficient finetuning of quantized llms. arXiv preprint arXiv:2305.14314.
  8. Guiding pretraining in reinforcement learning with large language models. arXiv preprint arXiv:2302.06692.
  9. Jonas Eschmann. 2021. Reward function design in reinforcement learning. Reinforcement Learning Algorithms: Analysis and Applications, pages 25–33.
  10. Llama rider: Spurring large language models to explore the open world. arXiv preprint arXiv:2310.08922.
  11. Danijar Hafner. 2021. Benchmarking the spectrum of agent capabilities. arXiv preprint arXiv:2109.06780.
  12. Mastering diverse domains through world models. arXiv preprint arXiv:2301.04104.
  13. Rainbow: Combining improvements in deep reinforcement learning. In Thirty-second AAAI conference on artificial intelligence.
  14. Human instruction-following with deep reinforcement learning via transfer-learning from text. arXiv preprint arXiv:2005.09382.
  15. Language models as zero-shot planners: Extracting actionable knowledge for embodied agents. arXiv preprint arXiv:2201.07207.
  16. Inner monologue: Embodied reasoning through planning with language models. arXiv preprint arXiv:2207.05608.
  17. Mistral 7b. arXiv preprint arXiv:2310.06825.
  18. Exploration in deep reinforcement learning: A survey. Information Fusion.
  19. Flexkbqa: A flexible llm-powered framework for few-shot knowledge base question answering. arXiv preprint arXiv:2308.12060.
  20. Corey Lynch and Pierre Sermanet. 2020. Language conditioned imitation learning over unstructured data. arXiv preprint arXiv:2005.07648.
  21. Human-level control through deep reinforcement learning. nature, 518(7540):529–533.
  22. Do embodied agents dream of pixelated sheep?: Embodied decision making using language guided world modelling. arXiv preprint arXiv:2301.12050.
  23. OpenAI. 2023. Gpt-4 technical report. arXiv preprint arXiv:2303.08774.
  24. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744.
  25. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR.
  26. Nils Reimers and Iryna Gurevych. 2019. Sentence-bert: Sentence embeddings using siamese bert-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics.
  27. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.
  28. Planning to explore via self-supervised world models. In International Conference on Machine Learning, pages 8583–8592. PMLR.
  29. Skill induction and planning with latent language. arXiv preprint arXiv:2110.01517.
  30. Reflexion: an autonomous agent with dynamic memory and self-reflection. arXiv preprint arXiv:2303.11366.
  31. Mastering the game of go without human knowledge. nature, 550(7676):354–359.
  32. Adaplanner: Adaptive planning from feedback with language models. arXiv preprint arXiv:2305.16653.
  33. Lamda: Language models for dialog applications. arXiv preprint arXiv:2201.08239.
  34. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
  35. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
  36. Can large language models play text games well? current state-of-the-art and open questions. arXiv preprint arXiv:2304.02868.
  37. Voyager: An open-ended embodied agent with large language models. arXiv preprint arXiv:2305.16291.
  38. Grounding language to entities and dynamics for generalization in reinforcement learning. arXiv preprint arXiv:2101.07393.
  39. Preserving in-context learning ability in large language model fine-tuning. arXiv preprint arXiv:2211.00635.
  40. Describe, explain, plan and select: Interactive planning with large language models enables open-world multi-task agents. arXiv preprint arXiv:2302.01560.
  41. Emergent abilities of large language models. arXiv preprint arXiv:2206.07682.
  42. Bloomberggpt: A large language model for finance. arXiv preprint arXiv:2303.17564.
  43. Spring: Gpt-4 out-performs rl algorithms by studying papers and reasoning. arXiv preprint arXiv:2305.15486.
  44. ReAct: Synergizing reasoning and acting in language models. In International Conference on Learning Representations (ICLR).
  45. Plan4mc: Skill reinforcement learning and planning for open-world minecraft tasks. arXiv preprint arXiv:2303.16563.
  46. Glm-130b: An open bilingual pre-trained model. arXiv preprint arXiv:2210.02414.
  47. Llama-adapter: Efficient fine-tuning of language models with zero-init attention. arXiv preprint arXiv:2303.16199.
  48. Silg: The multi-environment symbolic interactive language grounding benchmark. arXiv preprint arXiv:2110.10661.
Citations (5)

Summary

We haven't generated a summary for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Lightbulb On Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com