Emergent Mind

Abstract

BEIR is a benchmark dataset for zero-shot evaluation of information retrieval models across 18 different domain/task combinations. In recent years, we have witnessed the growing popularity of a representation learning approach to building retrieval models, typically using pretrained transformers in a supervised setting. This naturally begs the question: How effective are these models when presented with queries and documents that differ from the training data? Examples include searching in different domains (e.g., medical or legal text) and with different types of queries (e.g., keywords vs. well-formed questions). While BEIR was designed to answer these questions, our work addresses two shortcomings that prevent the benchmark from achieving its full potential: First, the sophistication of modern neural methods and the complexity of current software infrastructure create barriers to entry for newcomers. To this end, we provide reproducible reference implementations that cover the two main classes of approaches: learned dense and sparse models. Second, there does not exist a single authoritative nexus for reporting the effectiveness of different models on BEIR, which has led to difficulty in comparing different methods. To remedy this, we present an official self-service BEIR leaderboard that provides fair and consistent comparisons of retrieval models. By addressing both shortcomings, our work facilitates future explorations in a range of interesting research questions that BEIR enables.

We're not able to analyze this paper right now due to high demand.

Please check back later (sorry!).

Generate a detailed summary of this paper with a premium account.

We ran into a problem analyzing this paper.

Subscribe by Email

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.

References
  1. SparTerm: Learning Term-based Sparse Representation for Fast Text Retrieval
  2. MS MARCO: A Human Generated MAchine Reading COmprehension Dataset
  3. Automatic Combination of Multiple Ranked Retrieval Systems. In Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 1994). Dublin, Ireland, 173–181.
  4. A Full-Text Learning to Rank Dataset for Medical Information Retrieval. In Proceedings of the 38th European Conference on Information Retrieval (ECIR 2016). 716–722.
  5. James P. Callan. 1994. Passage-Level Evidence in Document Retrieval. In Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 1994). Dublin, Ireland, 302–310.
  6. SpaDE: Improving Sparse Representations Using a Dual Document Encoder for First-Stage Retrieval. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management (CIKM 2022). Atlanta, Georgia, 272–282.
  7. Zhuyun Dai and Jamie Callan. 2019. Deeper Text Understanding for IR with Contextual Neural Language Modeling. In Proceedings of the 42nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2019). Paris, France, 985–988.
  8. Aligning the Research and Practice of Building Search Applications: Elasticsearch and Pyserini. In Proceedings of the 15th ACM International Conference on Web Search and Data Mining (WSDM 2022). 1573–1576.
  9. SPLADE v2: Sparse Lexical and Expansion Model for Information Retrieval
  10. From Distillation to Hard Negative Sampling: Making Sparse Neural IR Models More Effective. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. Madrid, Spain, 2353–2359.
  11. Unsupervised Corpus Aware Language Model Pre-training for Dense Passage Retrieval
  12. COIL: Revisit Exact Lexical Match in Information Retrieval with Contextualized Inverted List. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Online, 3030–3042.
  13. Complementing Lexical Retrieval with Semantic Residual Embedding. In Proceedings of the 43rd European Conference on Information Retrieval (ECIR 2021), Part I. 146–160.
  14. Marti A. Hearst and Christian Plaunt. 1993. Subtopic Structuring for Full-Length Document Access. In Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 1993). Pittsburgh, Pennsylvania, 56–68.
  15. Efficiently Teaching an Effective Dense Retriever with Balanced Topic Aware Sampling. In Proceedings of the 44th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2021). 113–122.
  16. Unsupervised Dense Information Retrieval with Contrastive Learning
  17. Ultra-High Dimensional Sparse Representations with Binarization for Efficient Text Retrieval. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Online and Punta Cana, Dominican Republic, 1016–1029.
  18. Billion-scale similarity search with GPUs. IEEE Transactions on Big Data 7, 3 (2021), 535–547.
  19. Dense Passage Retrieval for Open-Domain Question Answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Online, 6769–6781.
  20. Omar Khattab and Matei Zaharia. 2020. ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2020). 39–48.
  21. Natural Questions: A Benchmark for Question Answering Research. Transactions of the Association of Computational Linguistics (2019).
  22. SLIM: Sparsified Late Interaction for Multi-Vector Retrieval with Inverted Indexes
  23. Jimmy Lin. 2021. A Proposed Conceptual Framework for a Representational Approach to Information Retrieval. SIGIR Forum 55, 2 (2021), 4:1–29.
  24. Building a Culture of Reproducibility in Academic Research
  25. A Few Brief Notes on DeepImpact, COIL, and a Conceptual Framework for Information Retrieval Techniques
  26. Pyserini: A Python Toolkit for Reproducible Information Retrieval Research with Sparse and Dense Representations. In Proceedings of the 44th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2021). 2356–2362.
  27. Pretrained Transformers for Text Ranking: BERT and Beyond. Morgan & Claypool Publishers.
  28. Document Expansions and Learned Sparse Lexical Representations for MS MARCO V1 and V2. In Proceedings of the 45th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2022). Madrid, Spain, 3187–3197.
  29. Another Look at DPR: Reproduction of Training and Replication of Retrieval. In Proceedings of the 44th European Conference on Information Retrieval (ECIR 2022), Part I. Stavanger, Norway, 613–626.
  30. The Expando-Mono-Duo Design Pattern for Text Ranking with Pretrained Sequence-to-Sequence Models
  31. RocketQA: An Optimized Training Approach to Dense Passage Retrieval for Open-Domain Question Answering. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Online, 5835–5847.
  32. RocketQAv2: A Joint Training Method for Dense Passage Retrieval and Passage Re-ranking. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Online and Punta Cana, Dominican Republic, 2825–2835.
  33. TREC-COVID: Rationale and Structure of an Information Retrieval Shared Task for COVID-19. Journal of the American Medical Informatics Association 27, 9 (2020), 1431–1436.
  34. BEIR: A Heterogeneous Benchmark for Zero-shot Evaluation of Information Retrieval Models. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2).
  35. Can Old TREC Collections Reliably Evaluate Modern Neural Retrieval Models?
  36. Retrieval of the Best Counterargument without Prior Topic Knowledge. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Melbourne, Australia, 241–251.
  37. Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval. In Proceedings of the 9th International Conference on Learning Representations (ICLR 2021).
  38. LaPraDoR: Unsupervised Pretrained Dense Retriever for Zero-Shot Text Retrieval
  39. EvalAI: Towards Better Evaluation Systems for AI Agents
  40. Anserini: Reproducible Ranking Baselines Using Lucene. Journal of Data and Information Quality 10, 4 (2018), Article 16.
  41. From Neural Re-Ranking to Neural Ranking: Learning a Sparse Representation for Inverted Indexing. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management (CIKM 2018). Torino, Italy, 497–506.
  42. Evaluating Interpolation and Extrapolation Performance of Neural Retrieval Models. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management (CIKM 2022). Atlanta, Georgia, 2486–2496.

Show All 42