Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 156 tok/s
Gemini 2.5 Pro 46 tok/s Pro
GPT-5 Medium 23 tok/s Pro
GPT-5 High 25 tok/s Pro
GPT-4o 58 tok/s Pro
Kimi K2 187 tok/s Pro
GPT OSS 120B 435 tok/s Pro
Claude Sonnet 4.5 39 tok/s Pro
2000 character limit reached

ReFIT: Relevance Feedback from a Reranker during Inference (2305.11744v2)

Published 19 May 2023 in cs.IR and cs.CL

Abstract: Retrieve-and-rerank is a prevalent framework in neural information retrieval, wherein a bi-encoder network initially retrieves a pre-defined number of candidates (e.g., K=100), which are then reranked by a more powerful cross-encoder model. While the reranker often yields improved candidate scores compared to the retriever, its scope is confined to only the top K retrieved candidates. As a result, the reranker cannot improve retrieval performance in terms of Recall@K. In this work, we propose to leverage the reranker to improve recall by making it provide relevance feedback to the retriever at inference time. Specifically, given a test instance during inference, we distill the reranker's predictions for that instance into the retriever's query representation using a lightweight update mechanism. The aim of the distillation loss is to align the retriever's candidate scores more closely with those produced by the reranker. The algorithm then proceeds by executing a second retrieval step using the updated query vector. We empirically demonstrate that this method, applicable to various retrieve-and-rerank frameworks, substantially enhances retrieval recall across multiple domains, languages, and modalities.

Citations (1)

Summary

  • The paper proposes ReFIT, which enhances retrieval recall by distilling reranker outputs to update query representations via KL divergence.
  • It integrates an inference-time distillation process into a dual-step retrieval framework, yielding consistent improvements on benchmarks like BEIR and Mr.TyDi.
  • The approach is lightweight, architecture-agnostic, and applicable to various modalities including text-to-video and multilingual IR tasks with minimal latency increase.

ReFIT: Relevance Feedback from a Reranker during Inference

The paper "ReFIT: Relevance Feedback from a Reranker during Inference" (2305.11744) introduces an innovative approach to improve the recall of information retrieval (IR) systems. This method, termed ReFIT, enhances the classic retrieve-and-rerank (R&R) framework by leveraging the re-ranker's output as inference-time relevance feedback to update the query representations within the retriever. This essay examines the methodology, experimental results, and implications of this approach.

Methodology

The ReFIT approach integrates a novel inference-time distillation process into the traditional retrieve-and-rerank (R&R) framework to compute a new query vector that improves recall when used in a second retrieval step. The key idea is to update the retriever's query representation using the output of a more powerful cross-encoder re-ranker. This update is performed by minimizing a distillation loss which aligns the retriever's score distribution with that of the re-ranker. Figure 1

Figure 1: The proposed method for re-ranker relevance feedback. We introduce an inference-time distillation process (step 3) into the traditional retrieve-and-rerank framework (steps 1 and 2) to compute a new query vector, which improves recall when used for a second retrieval step (step 4).

The process begins with a typical R&R framework involving an initial retrieval using a dual-encoder and subsequent reranking using a cross-encoder. ReFIT introduces a novel inference-time component whereby the re-ranker's outputs are distilled into an updated query vector. This is achieved using Kullback-Leibler (KL) divergence to minimize the discrepancy between the distributions of the re-ranker and the retriever. The resulting updated query vector is then used for a second retrieval step, thereby enhancing the retrieval recall.

Experimental Results

Experimental evaluations demonstrate significant improvements in retrieval recall across multiple domains, languages, and modalities. The model shows consistent performance enhancement on the BEIR benchmark and the multilingual Mr.TyDi benchmark, as well as a multi-modal retrieval setup involving text-to-video retrieval. Figure 2

Figure 2

Figure 2

Figure 2

Figure 2: t-SNE plots for four examples from BEIR, with the query vectors shown alongside the corresponding positive passages. The updated query vectors (in blue) after our relevance feedback approach are closer to the positive passages (in green) on average compared to the original query vectors (in red).

The results highlight the efficacy of ReFIT in achieving higher Recall@100 metrics compared to both retrieval-only baselines like BM25, DPR, and modern dual-encoders, as well as traditional R&R frameworks. Notably, ReFIT achieves these improvements with a marginal increase in latency compared to re-ranking larger pools of candidates.

Discussion and Future Work

ReFIT provides a lightweight, parameter-free means to significantly enhance the recall capabilities of existing retrieval frameworks by merely updating query representations at inference time. Its architecture-agnostic nature makes it applicable across various domains, languages, and modalities without altering the underlying models. Future work could explore extending the relevance feedback concept to improve token-level query representations for better interpretability and examining the potential for iterative relevance feedback rounds.

The ability of ReFIT to leverage re-rankers' discriminative power at inference time positions it as a valuable tool for IR tasks where recall improvement is critical, such as open-domain question answering and dialog generation.

Conclusion

This paper presents a methodological advancement in neural IR by harnessing inference-time relevance feedback from a reranker to refine query representations within the retriever model. ReFIT not only enhances performance metrics across multiple benchmarks but does so with efficiency, maintaining competitive inference times. Its implementation simplicity and demonstrated effectiveness across different IR setups signify its potential for widespread adoption in environments where retrieval accuracy and recall are paramount.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 2 tweets and received 11 likes.

Upgrade to Pro to view all of the tweets about this paper: