Traditional IR rivals neural models on the MS MARCO Document Ranking Leaderboard (2012.08020v3)

Published 15 Dec 2020 in cs.CL and cs.IR

Abstract: This short document describes a traditional IR system that achieved MRR@100 equal to 0.298 on the MS MARCO Document Ranking leaderboard (on 2020-12-06). Although inferior to most BERT-based models, it outperformed several neural runs (as well as all non-neural ones), including two submissions that used a large pretrained Transformer model for re-ranking. We provide software and data to reproduce our results.

Citations (3)

View on Semantic Scholar

Summary

The paper demonstrates that a finely tuned traditional IR pipeline can rival modern neural models on MS MARCO, achieving an MRR@100 of 0.298.
It integrates BM25 candidate generation with LambdaMart re-ranking and leverages IBM Model 1 to enhance query-document matching.
The study implies that traditional IR techniques offer a cost-effective alternative to resource-intensive neural approaches while delivering competitive performance.

Traditional Information Retrieval Systems Against Neural Models in Document Ranking

The paper "Traditional IR rivals neural models on the MS MARCO Document Ranking Leaderboard" by Leonid Boytsov presents a compelling exploration of the effectiveness of traditional information retrieval (IR) systems when placed in direct competition with cutting-edge neural models. The paper reports the achievement of a Mean Reciprocal Rank at 100 (MRR@100) of 0.298 on the MS MARCO Document Ranking task, demonstrating noteworthy performance among competing systems, including several neural approaches.

System Design and Implementation

The work embodies a focused design of a traditional IR pipeline that challenges the seemingly predominant dominance of neural models in current retrieval tasks. The implementation employs FlexNeuART, an advanced retrieval toolkit designed to process multi-field JSON data formats. Documents in the MS MARCO dataset are parsed into fields such as URL, title, and body, each undergoing tokenization and further text pre-processing.

Features and Ranking Mechanisms

The devised retrieval system undertakes a two-tier approach involving BM25-based candidate generation followed by LambdaMart-based re-ranking using a set of 13 features. These features amalgamate standard measurements, including BM25, cosine similarity, and proximity scores, augmented by lexical translation features like IBM Model 1 log-scores.

Notably, IBM Model 1 is emphasized as a core component, leveraging statistical machine translation principles to produce word translation probabilities, which facilitate enhanced query-document matching. Although common in neural workflows, the integration of these statistical models into a traditional IR system reflects strategic innovation.

Performance Analysis and Implications

According to the assessment on TREC NIST data for 2019 and 2020, the system secured NDCG@10 scores of 0.584 and 0.558, respectively, surpassing tuned BM25 configurations by approximately 6-7%. This highlights the potential effectiveness of traditional systems when carefully calibrated and developed with an in-depth understanding of textual patterns and statistical models.

The implications of these findings are significant for both practical applications and theoretical advancements. Practically, the research suggests that traditional IR systems can still provide substantial performance under resource constraints, offering a cost-effective alternative to neural models that often require intensive computational resources and extensive pre-training. Theoretically, it challenges existing paradigms that prioritize neural models, suggesting potential areas of exploration in optimizing traditional algorithms.

Prospects for Future Research

Looking forward, further investigation into the scalability and efficiency of such systems could address existing limitations related to computational speed and resource usage. Optimizing index-time computations and feature extraction processes can bridge the performance gaps with neural models even more comprehensively. Enhanced integration methods, bridging traditional techniques with modern machine learning insights, might offer novel pathways for system improvements.

Overall, this work provides a foundational exploration that revitalizes interest in traditional IR methodologies, advocating for a balanced approach in the evaluation of retrieval systems, combining elements from both established techniques and innovative neural architectures.