ReST meets ReAct: Self-Improvement for Multi-Step Reasoning LLM Agent (2312.10003v1)
Abstract: Answering complex natural language questions often necessitates multi-step reasoning and integrating external information. Several systems have combined knowledge retrieval with a LLM to answer such questions. These systems, however, suffer from various failure cases, and we cannot directly train them end-to-end to fix such failures, as interaction with external knowledge is non-differentiable. To address these deficiencies, we define a ReAct-style LLM agent with the ability to reason and act upon external knowledge. We further refine the agent through a ReST-like method that iteratively trains on previous trajectories, employing growing-batch reinforcement learning with AI feedback for continuous self-improvement and self-distillation. Starting from a prompted large model and after just two iterations of the algorithm, we can produce a fine-tuned small model that achieves comparable performance on challenging compositional question-answering benchmarks with two orders of magnitude fewer parameters.
- Palm 2 technical report. arXiv preprint arXiv:2305.10403, 2023.
- Vladimir Blagojevic. Long-form qa beyond eli5: an updated dataset and approach, 2022. URL towardsdatascience.com/long-form-qa-beyond-eli5-an-updated-dataset-and-approach-319cb841aabb.
- Harrison Chase. Langchain. https://github.com/hwchase17/langchain, 2022.
- Fireact: Toward language agent fine-tuning, 2023.
- Language model cascades, 2022.
- Raft: Reward ranked finetuning for generative foundation model alignment, 2023.
- ELI5: long form question answering. CoRR, abs/1907.09190, 2019. URL http://arxiv.org/abs/1907.09190.
- Pal: Program-aided language models, 2023.
- Reinforced self-training (rest) for language modeling. arXiv preprint arXiv:2308.08998, 2023.
- Large language models cannot self-correct reasoning yet, 2023.
- Demonstrate-search-predict: Composing retrieval and language models for knowledge-intensive nlp, 2023a.
- Dspy: Compiling declarative language model calls into self-improving pipelines, 2023b.
- Hurdles to progress in long-form question answering, 2021.
- Let’s verify step by step, 2023.
- Jerry Liu. Llamaindex. https://github.com/jerryjliu/llama_index, 2022.
- Self-refine: Iterative refinement with self-feedback, 2023.
- Webgpt: Browser-assisted question-answering with human feedback. arXiv preprint arXiv:2112.09332, 2021.
- Measuring and narrowing the compositionality gap in language models, 2023.
- Iterated decomposition: Improving science q&a by supervising reasoning processes, 2023.
- Reflexion: an autonomous agent with dynamic memory and self-reflection. arXiv preprint arXiv:2303.11366, 2023.
- Beyond human data: Scaling self-training for problem-solving with language models, 2023.
- Solving math word problems with process- and outcome-based feedback, 2022.
- The rise and potential of large language model based agents: A survey, 2023.
- Hotpotqa: A dataset for diverse, explainable multi-hop question answering. CoRR, abs/1809.09600, 2018. URL http://arxiv.org/abs/1809.09600.
- React: Synergizing reasoning and acting in language models. arXiv preprint arXiv:2210.03629, 2022.
- Star: Bootstrapping reasoning with reasoning, 2022.
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.