Papers

Topics

Authors

Recent

View all

Assistant

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 177 tok/s

Gemini 2.5 Pro 43 tok/s Pro

GPT-5 Medium 26 tok/s Pro

GPT-5 High 25 tok/s Pro

GPT-4o 119 tok/s Pro

Kimi K2 202 tok/s Pro

GPT OSS 120B 439 tok/s Pro

Claude Sonnet 4.5 38 tok/s Pro

2000 character limit reached

GAR-meets-RAG Paradigm for Zero-Shot Information Retrieval (2310.20158v1)

Published 31 Oct 2023 in cs.CL

Abstract: Given a query and a document corpus, the information retrieval (IR) task is to output a ranked list of relevant documents. Combining LLMs with embedding-based retrieval models, recent work shows promising results on the zero-shot retrieval problem, i.e., no access to labeled data from the target domain. Two such popular paradigms are generation-augmented retrieval or GAR (generate additional context for the query and then retrieve), and retrieval-augmented generation or RAG (retrieve relevant documents as context and then generate answers). The success of these paradigms hinges on (i) high-recall retrieval models, which are difficult to obtain in the zero-shot setting, and (ii) high-precision (re-)ranking models which typically need a good initialization. In this work, we propose a novel GAR-meets-RAG recurrence formulation that overcomes the challenges of existing paradigms. Our method iteratively improves retrieval (via GAR) and rewrite (via RAG) stages in the zero-shot setting. A key design principle is that the rewrite-retrieval stages improve the recall of the system and a final re-ranking stage improves the precision. We conduct extensive experiments on zero-shot passage retrieval benchmarks, BEIR and TREC-DL. Our method establishes a new state-of-the-art in the BEIR benchmark, outperforming previous best results in Recall@100 and nDCG@10 metrics on 6 out of 8 datasets, with up to 17% relative gains over the previous best.

Citations (14)

View on Semantic Scholar