FIT-RAG: Black-Box RAG with Factual Information and Token Reduction (2403.14374v1)

Published 21 Mar 2024 in cs.CL and cs.IR

Abstract: Due to the extraordinarily large number of parameters, fine-tuning LLMs to update long-tail or out-of-date knowledge is impractical in lots of applications. To avoid fine-tuning, we can alternatively treat a LLM as a black-box (i.e., freeze the parameters of the LLM) and augment it with a Retrieval-Augmented Generation (RAG) system, namely black-box RAG. Recently, black-box RAG has achieved success in knowledge-intensive tasks and has gained much attention. Existing black-box RAG methods typically fine-tune the retriever to cater to LLMs' preferences and concatenate all the retrieved documents as the input, which suffers from two issues: (1) Ignorance of Factual Information. The LLM preferred documents may not contain the factual information for the given question, which can mislead the retriever and hurt the effectiveness of black-box RAG; (2) Waste of Tokens. Simply concatenating all the retrieved documents brings large amounts of unnecessary tokens for LLMs, which degenerates the efficiency of black-box RAG. To address these issues, this paper proposes a novel black-box RAG framework which utilizes the factual information in the retrieval and reduces the number of tokens for augmentation, dubbed FIT-RAG. FIT-RAG utilizes the factual information by constructing a bi-label document scorer. Besides, it reduces the tokens by introducing a self-knowledge recognizer and a sub-document-level token reducer. FIT-RAG achieves both superior effectiveness and efficiency, which is validated by extensive experiments across three open-domain question-answering datasets: TriviaQA, NQ and PopQA. FIT-RAG can improve the answering accuracy of Llama2-13B-Chat by 14.3\% on TriviaQA, 19.9\% on NQ and 27.5\% on PopQA, respectively. Furthermore, it can save approximately half of the tokens on average across the three datasets.

Citations (4)

View on Semantic Scholar

Summary

The paper introduces FIT-RAG, a novel framework that enhances black-box RAG by integrating factual retrieval to improve document selection.
It employs a bi-label document scorer and sub-document token reduction to address factual inaccuracies and minimize token waste.
Validation on TriviaQA, NQ, and PopQA demonstrates significant efficiency gains and accuracy improvements up to 27.5%, with token usage reduced by half.

The paper "FIT-RAG: Black-Box RAG with Factual Information and Token Reduction" addresses significant challenges in the field of knowledge-intensive tasks utilizing LLMs. Fine-tuning LLMs to update long-tail or out-of-date knowledge often proves impractical due to the sheer number of parameters involved. Instead, the authors advocate treating LLMs as black-boxes (i.e., freezing their parameters) and augmenting them with a Retrieval-Augmented Generation (RAG) system, a method known as black-box RAG.

However, existing black-box RAG methodologies face two notable issues:

Ignorance of Factual Information: Existing systems typically fine-tune the retriever to match LLMs' preferences, which may not always align with factual accuracy, potentially misleading the retriever and diminishing the efficacy of RAG.
Waste of Tokens: By concatenating all retrieved documents into the input, many unnecessary tokens are included, reducing efficiency.

To mitigate these issues, the authors propose a novel black-box RAG framework named FIT-RAG. This framework introduces two main strategies:

Bi-label Document Scorer: This mechanism utilizes factual information to improve the retrieval process. It ensures that the documents selected for augmentation align with the factual requirements of the given question.
Self-Knowledge Recognizer and Sub-document-level Token Reducer: This component aims to minimize token waste by intelligently reducing the number of tokens used for augmentation. It recognizes essential information within the retrieved documents and selectively reduces token usage.

The efficacy and efficiency of FIT-RAG are validated through extensive experiments on three open-domain question-answering datasets: TriviaQA, NQ, and PopQA. The results are notable, with FIT-RAG significantly improving the answering accuracy of Llama2-13B-Chat:

14.3% increase on TriviaQA
19.9% increase on NQ
27.5% increase on PopQA

Moreover, FIT-RAG demonstrates substantial improvements in efficiency, reducing token usage by approximately half across the three datasets.

In summary, FIT-RAG represents a significant advancement in black-box RAG systems, offering both enhanced accuracy and efficiency by addressing the fundamental issues of factual information inclusion and token waste reduction.

PDF Markdown

Related Papers

Tweets

https://twitter.com/_reachsumit/status/1771015589336945126

https://twitter.com/Pavan_Belagatti/status/1778311550027051315

https://twitter.com/gm8xx8/status/1771022863316689267