Emergent Mind

Multi-Conditional Ranking with Large Language Models

(2404.00211)
Published Mar 30, 2024 in cs.CL and cs.LG

Abstract

Utilizing LLMs to rank a set of items has become a common approach in recommendation and retrieval systems. Typically, these systems focus on ordering a substantial number of documents in a monotonic order based on a given query. However, real-world scenarios often present a different challenge: ranking a comparatively smaller set of items, but according to a variety of diverse and occasionally conflicting conditions. In this paper, we define and explore the task of multi-conditional ranking by introducing MCRank, a benchmark tailored for assessing multi-conditional ranking across various item types and conditions. Our analysis of LLMs using MCRank indicates a significant decrease in performance as the number and complexity of items and conditions grow. To overcome this limitation, we propose a novel decomposed reasoning method, consisting of EXtracting and Sorting the conditions, and then Iterativly Ranking the items (EXSIR). Our extensive experiments show that this decomposed reasoning method enhances LLMs' performance significantly, achieving up to a 12% improvement over existing LLMs. We also provide a detailed analysis of LLMs performance across various condition categories, and examine the effectiveness of decomposition step. Furthermore, we compare our method with existing approaches such as Chain-of-Thought and an encoder-type ranking model, demonstrating the superiority of our approach and complexity of MCR task. We released our dataset and code.

Comparison of EXSIR and zero-shot CoT on paragraph-level items, including ColBERT as a benchmark.

Overview

  • This paper introduces the concept of multi-conditional ranking (MCR), provides a specialized benchmark named MCRank for evaluating LLMs on MCR tasks, and proposes a novel method, EXSIR, to enhance LLMs' performance.

  • MCRank tests LLMs' abilities to rank items based on multiple conditions across various scenarios, aiming to reflect real-world applications like recommendation and sorting systems.

  • The EXSIR method, which stands for EXtract and Sort the conditions, then Iteratively Rank the items, significantly boosts LLMs' efficiency in handling complex MCR tasks by decomposing the reasoning process.

  • Experimental results show that employing EXSIR leads to up to a 12% accuracy improvement in LLMs like GPT-4 on the MCRank benchmark, indicating a robust method for enhancing multi-conditional ranking tasks.

Multi-Conditional Ranking with LLMs: Introducing MCRank and EXSIR Method

Introduction

The ubiquity of recommendation and retrieval systems in digital platforms necessitates advanced methods for ranking a set of items. While significant progress has been made in ranking large document collections, the unique challenge of ranking a smaller set of items based on multiple and potentially conflicting conditions has been less explored. This paper addresses this gap by defining the task of multi-conditional ranking (MCR), presenting MCRank—a benchmark tailored for evaluating MCR across various item types and conditions—and proposing a novel decomposed reasoning method, EXSIR, for enhancing LLMs performance on MCR tasks.

MCRank Benchmark

MCRank is designed to rigorously test LLMs' abilities in multi-conditional ranking tasks. The benchmark includes diverse categories of conditions such as positional, locational, temporal, trait-based, and reasoning types, across scenarios involving one to three conditions and sets of 3, 5, or 7 items, classified into token-level and paragraph-level items. The crafted dataset allows for comprehensive evaluation of model capability in handling complex ranking tasks that are closer to real-world applications like recommendation systems, educational question ordering, and job application sorting.

EXSIR: A Decomposed Reasoning Method

This paper introduces EXSIR (EXtract and Sort the conditions, then Iteratively Rank the items), a decomposed reasoning method that significantly improves LLMs' efficiency in multi-conditional ranking tasks. The method involves first extracting and sorting conditions based on priority, followed by iteratively applying these sorted conditions to rank the items. This approach is instrumental in overcoming the observed performance decline of LLMs, including GPT-4, ChatGPT, and Mistral, as the complexity of the ranking task increases.

Experimental Results

The evaluation of LLMs on MCRank using EXSIR demonstrates notable improvements in performance across various settings, with GPT-4 showing up to a 12% accuracy enhancement. This highlights the effectiveness of the decomposed reasoning method in bolstering LLMs' capacity to handle intricate multi-conditional ranking tasks. Detailed analysis of performance across condition categories and the success of the decomposition step further underscores the robustness of the EXSIR method.

Implications and Future Directions

The findings from this research have both practical and theoretical implications. Practically, the EXSIR method and the MCRank benchmark lay the groundwork for developing more sophisticated ranking systems that can navigate the complexities of multiple conditions. Theoretically, the study adds to our understanding of decomposed reasoning in AI and its application in enhancing LLMs performance.

Future research might explore extending the EXSIR method to other forms of decomposed reasoning tasks beyond ranking, assessing the viability of incorporating user interaction in ranking systems, and evaluating the potential of multi-agent systems where tasks are divided among specialized models for improved efficiency.

Conclusion

This paper presents a significant step forward in the domain of multi-conditional ranking, introducing the comprehensive MCRank benchmark and the EXSIR method. Experimentation demonstrates the enhanced capability of LLMs in accurately performing multi-conditional ranking tasks when leveraging decomposed reasoning. These contributions are expected to facilitate future advancements in the development of more effective and sophisticated recommendation and retrieval systems.

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.