Emergent Mind

Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection

(2310.11511)
Published Oct 17, 2023 in cs.CL , cs.AI , and cs.LG

Abstract

Despite their remarkable capabilities, LLMs often produce responses containing factual inaccuracies due to their sole reliance on the parametric knowledge they encapsulate. Retrieval-Augmented Generation (RAG), an ad hoc approach that augments LMs with retrieval of relevant knowledge, decreases such issues. However, indiscriminately retrieving and incorporating a fixed number of retrieved passages, regardless of whether retrieval is necessary, or passages are relevant, diminishes LM versatility or can lead to unhelpful response generation. We introduce a new framework called Self-Reflective Retrieval-Augmented Generation (Self-RAG) that enhances an LM's quality and factuality through retrieval and self-reflection. Our framework trains a single arbitrary LM that adaptively retrieves passages on-demand, and generates and reflects on retrieved passages and its own generations using special tokens, called reflection tokens. Generating reflection tokens makes the LM controllable during the inference phase, enabling it to tailor its behavior to diverse task requirements. Experiments show that Self-RAG (7B and 13B parameters) significantly outperforms state-of-the-art LLMs and retrieval-augmented models on a diverse set of tasks. Specifically, Self-RAG outperforms ChatGPT and retrieval-augmented Llama2-chat on Open-domain QA, reasoning and fact verification tasks, and it shows significant gains in improving factuality and citation accuracy for long-form generations relative to these models.

Self-Rag enhances text generation quality and factuality by learning retrieval, critique, and generation methods.

Overview

  • Introduces Self-Rag, a method improving LLMs with adaptive retrieval and self-critical tokens.

  • Self-Rag employs retrieval tokens for on-demand information access, and critique tokens for self-assessment.

  • Incorporates reflection tokens within training to teach the LLM when to retrieve information and to self-critique output quality.

  • Outperforms traditional LLMs and RAG models in tasks like open-domain QA and long-form generation, enhancing factuality and accuracy.

  • Presents a dynamic and versatile approach to retrieval-augmented LLMs, facilitating self-improvement and in-the-moment adaptability.

Introduction

The concept of Retrieval-Augmented Generation (RAG) has proven effective in enhancing the output of LLMs by coupling them with relevant knowledge sources. However, existing approaches have drawbacks, such as inflexibility in retrieval and potential consistency issues between generation and retrieved passages. Addressing these limitations, this paper introduces Self-Rag, a method allowing on-demand retrieval and the use of reflection tokens by LLMs for self-assessment.

Model Framework

The Self-Rag framework distinguishes itself with its adaptive retrieval mechanism and self-reflective generation process. Unlike the traditional fixed retrieval count, Self-Rag decides when to retrieve informative passages based on demand. During the generation, reflection tokens are employed to guide and evaluate the process—namely, retrieval tokens signal the necessity for retrieval, while critique tokens serve to self-critique the LLM's generation in terms of relevance and support from relevant passages. This system enhances performance across tasks without sacrificing the model's original versatility.

Training and Inference

The paper describes the seamless integration of reflection tokens into the LLM's training regime, allowing it to predict when retrieval is beneficial and to self-evaluate the quality of its outputs. A critic model is trained to generate these reflection tokens and this knowledge is distilled into an in-house model, circumventing ongoing dependencies on proprietary LMs during operation. At inference, Self-Rag can adjust its behavior to fit task requirements by leveraging the reflection tokens, thus providing a flexible and customizable solution.

Empirical Evaluation

In evaluating Self-Rag, the authors find that it significantly outperforms both traditional LLMs and RAG approaches across a range of tasks, including open-domain QA and long-form generation. These improvements are not only in overall quality but also in factuality and citation accuracy. Furthermore, ablation studies confirm the essential role of both retrieval and critique mechanisms in maximizing the model's performance.

Conclusion

Self-Rag represents a major step in the evolution of retrieval-augmented LLMs, offering a dynamic, self-assessing, and versatile framework that improves upon the factual accuracy in knowledge-intensive tasks. Through its design, it provides a means for models to self-improve and adapt on-the-fly, marking a notable contribution to the field of AI and natural language processing.

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.

YouTube