A Setwise Approach for Effective and Highly Efficient Zero-shot Ranking with Large Language Models (2310.09497v2)

Published 14 Oct 2023 in cs.IR and cs.AI

Abstract: We propose a novel zero-shot document ranking approach based on LLMs: the Setwise prompting approach. Our approach complements existing prompting approaches for LLM-based zero-shot ranking: Pointwise, Pairwise, and Listwise. Through the first-of-its-kind comparative evaluation within a consistent experimental framework and considering factors like model size, token consumption, latency, among others, we show that existing approaches are inherently characterised by trade-offs between effectiveness and efficiency. We find that while Pointwise approaches score high on efficiency, they suffer from poor effectiveness. Conversely, Pairwise approaches demonstrate superior effectiveness but incur high computational overhead. Our Setwise approach, instead, reduces the number of LLM inferences and the amount of prompt token consumption during the ranking procedure, compared to previous methods. This significantly improves the efficiency of LLM-based zero-shot ranking, while also retaining high zero-shot ranking effectiveness. We make our code and results publicly available at \url{https://github.com/ielab/LLM-rankers}.

References (29)

Citations (12)

View on Semantic Scholar

Summary

The paper presents the Setwise prompting approach, balancing efficiency and accuracy in zero-shot document ranking with LLMs.
It leverages concurrent document evaluation and sorting algorithms to drastically reduce LLM inference costs compared to traditional methods.
Empirical evaluations on TREC and BEIR benchmarks demonstrate robust performance and reduced query latency across varying initial conditions.

A Setwise Approach for Effective and Highly Efficient Zero-shot Ranking with LLMs

LLMs have recently demonstrated substantial efficacy in zero-shot document ranking tasks. However, their application has been hindered by computational inefficiencies, particularly when using Pointwise, Pairwise, and Listwise prompting approaches. This paper evaluates these traditional methods and introduces a novel Setwise prompting approach to balance effectiveness and efficiency in LLM-based zero-shot ranking tasks.

Evaluation of Traditional Approaches

The paper begins by analyzing existing Pointwise, Pairwise, and Listwise prompting methods. The primary objective is to identify trade-offs between effectiveness and computational efficiency. According to the results, Pointwise approaches present high efficiency but low effectiveness, which can be attributed to their reliance on individual document evaluations. In contrast, Pairwise approaches provide better effectiveness through document comparisons, yet incur significant computational overhead due to a high number of LLM inferences required. Listwise methods, which generate document rankings in order, offer varying efficiency and effectiveness contingent on specific configurations and evaluation settings.

Introduction of the Setwise Approach

To address the shortcomings found in the traditional methods, the authors propose a Setwise prompting approach designed to enhance the efficiency of LLM-based zero-shot ranking. This methodology reduces the number of LLM inferences and the token consumption. By increasing the number of documents evaluated concurrently, the Setwise approach leverages sorting algorithms like Heap sort and Bubble sort to achieve efficiency gains. The authors position Setwise as a middle ground, combining the desirable characteristics of Pointwise, Pairwise, and Listwise approaches.

Empirical Evaluation and Results

Comprehensive empirical evaluations are conducted using the TREC Deep Learning datasets and the BEIR benchmark, employing LLMs of various sizes, including the Flan-T5 models. The results underscore that Setwise prompting significantly reduces computational costs while maintaining high levels of ranking effectiveness. For instance, Setwise approaches manage to decrease the average query latency compared to traditional methods without compromising accuracy.

The experiments also reveal interesting sensitivity characteristics. Among the noteworthy findings is the robustness of Setwise to the initial ranking order, surpassing the performance consistency of existing approaches when the initial candidate document ordering differs. Additionally, the utilization of LLM logits for likelihood estimation in Setwise further boosts efficiency while retaining effectiveness.

Practical and Theoretical Implications

The introduction of Setwise prompting holds promising implications both in practical applications and theoretical explorations. Practically, the reduction in computational load and cost implies that real-world applications, such as large-scale search engines and information retrieval systems, can integrate LLMs more efficiently. Theoretically, the paper suggests potential opportunities to refine LLM capabilities in tackling zero-shot scenarios, encouraging further research into auxiliary techniques like prompt learning for enhanced performance.

Moreover, the robust performance of Setwise across various initial conditions indicates a broader applicability, suggesting that future work could expand on improving ranking tasks across different retrieval pipelines and even explore extending these concepts to other natural language processing tasks beyond document ranking.

In conclusion, the paper showcases how a methodical redesign of prompting strategies—embodied by Setwise prompting—can lead to measurable advancements in zero-shot document ranking tasks, paving the way for more scalable and efficient use of LLMs in information retrieval applications.