Papers
Topics
Authors
Recent
2000 character limit reached

RAFT: Adapting Language Model to Domain Specific RAG (2403.10131v2)

Published 15 Mar 2024 in cs.CL and cs.AI

Abstract: Pretraining LLMs on large corpora of textual data is now a standard paradigm. When using these LLMs for many downstream applications, it is common to additionally bake in new knowledge (e.g., time-critical news, or private domain knowledge) into the pretrained model either through RAG-based-prompting, or fine-tuning. However, the optimal methodology for the model to gain such new knowledge remains an open question. In this paper, we present Retrieval Augmented FineTuning (RAFT), a training recipe that improves the model's ability to answer questions in a "open-book" in-domain settings. In RAFT, given a question, and a set of retrieved documents, we train the model to ignore those documents that don't help in answering the question, which we call, distractor documents. RAFT accomplishes this by citing verbatim the right sequence from the relevant document that would help answer the question. This coupled with RAFT's chain-of-thought-style response helps improve the model's ability to reason. In domain-specific RAG, RAFT consistently improves the model's performance across PubMed, HotpotQA, and Gorilla datasets, presenting a post-training recipe to improve pre-trained LLMs to in-domain RAG. RAFT's code and demo are open-sourced at github.com/ShishirPatil/gorilla.

Citations (108)

Summary

  • The paper demonstrates how fine-tuning LLMs with a mix of oracle and distractor documents improves domain-specific retrieval accuracy.
  • RAFT employs a novel retrieval-augmented fine-tuning methodology that simulates open-book testing to enhance reasoning and citation precision.
  • Experimental results on datasets like PubMed QA and HotpotQA show RAFT’s robust performance compared to traditional fine-tuning approaches.

Adapting LLMs to Domain-Specific Retrieval-Augmented Generation with RAFT

Introduction

The paper "RAFT: Adapting LLM to Domain Specific RAG" presents a novel approach for fine-tuning LLMs in specialized domains using Retrieval-Augmented Generation (RAG). This approach, Retrieval Augmented Fine Tuning (RAFT), is designed to improve LLM performance in domain-specific, open-book settings. The primary goal is to enhance the model's ability to reason and cite relevant documents to answer questions accurately.

Challenges in Adaptation

LLMs are increasingly pivotal in tasks within specialized domains, requiring not just general knowledge but high precision from specific document sets, such as enterprise documentation or recent publication databases. In such settings, fine-tuning can be implemented either through in-context learning with RAG or supervised learning. In-context learning with RAG allows the model to reference documents during question answering. However, it does not necessarily prepare models for the open-book nature of test settings where document relevance is critical.

Supervised fine-tuning affords pattern recognition within fixed domains, aligning with end-user needs but often failing to optimize document retrieval quality during training. These challenges are akin to preparing for an open-book exam, wherein RAFT simulates effective study practices by focusing on reasoning and relevant citations.

RAFT Methodology

RAFT innovatively combines supervised fine-tuning with RAG by training the model to discern and cite relevant documents while ignoring distractor documents. This approach ensures robustness against retrieval inaccuracies, fostering domain-specific expertise without forgoing general retrieval efficiency. The training data is curated to include both ‘oracle’ documents containing relevant information and distractor documents irrelevant to the query context. This dual document strategy forces models to distinguish between pertinent and non-pertinent documents during training. Figure 1

Figure 1: RAFT approach leverages fine-tuning with question-answer pairs while referencing documents in a simulated imperfect retrieval setting to prepare for open-book exam settings.

Experimental Evaluation

RAFT was evaluated using multiple datasets, such as PubMed QA, HotpotQA, and Gorilla API Benchmarks, demonstrating significant improvement in domain-specific question answering compared to conventional methods. In these experiments, RAFT consistently outperformed both domain-specific fine-tuning and general RAG settings, underlining its efficacy in specialized contexts. Figure 2

Figure 2

Figure 2

Figure 2: Results on NQ, TQA, and HotpotQA suggest that mixing a fraction of data without the oracle document in its context is helpful for in-domain RAG.

One critical finding was that training with a mix of oracle and distractor documents optimizes retrieval performance, challenging prior assumptions that strict oracle inclusion is optimal. This nuanced training approach improves model robustness, enabling efficient navigation through varying numbers of test documents. Figure 3

Figure 3

Figure 3: Study on robustness to the varying number of test-time documents provided by the retriever highlights training with 4 documents as optimal for NQ and 2 for HotpotQA.

Implications and Future Work

RAFT offers potent strategies for enhancing LLMs in domain-specific settings by refining their ability to reason through retrieved documents, potentially influencing future LLM developments in specialized applications. Its design principles of leveraging distractor documents and chain-of-thought answers suggest pathways for refined document processing capabilities.

The broader implications suggest the potential for smaller, fine-tuned models to match or exceed general-purpose LLMs on domain-specific tasks. RAFT’s open-source availability encourages widespread application and further experimentation in the domain-specific alignment of LLMs.

Conclusion

RAFT proposes an efficient methodology for adapting LLMs to domain-specific retrieval tasks by integrating distractor documents into fine-tuning processes. The approach successfully enhances retrieval performance and robustness, offering significant potential for future LLM advancements in specialized contexts. The paper's insights underline the necessity for domain-specific adaptation strategies, paving the way for improved application-specific LLM deployments.

Whiteboard

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 50 tweets with 1554 likes about this paper.

HackerNews