To Generate or to Retrieve? On the Effectiveness of Artificial Contexts for Medical Open-Domain Question Answering (2403.01924v2)

Published 4 Mar 2024 in cs.CL and cs.AI

Abstract: Medical open-domain question answering demands substantial access to specialized knowledge. Recent efforts have sought to decouple knowledge from model parameters, counteracting architectural scaling and allowing for training on common low-resource hardware. The retrieve-then-read paradigm has become ubiquitous, with model predictions grounded on relevant knowledge pieces from external repositories such as PubMed, textbooks, and UMLS. An alternative path, still under-explored but made possible by the advent of domain-specific LLMs, entails constructing artificial contexts through prompting. As a result, "to generate or to retrieve" is the modern equivalent of Hamlet's dilemma. This paper presents MedGENIE, the first generate-then-read framework for multiple-choice question answering in medicine. We conduct extensive experiments on MedQA-USMLE, MedMCQA, and MMLU, incorporating a practical perspective by assuming a maximum of 24GB VRAM. MedGENIE sets a new state-of-the-art in the open-book setting of each testbed, allowing a small-scale reader to outcompete zero-shot closed-book 175B baselines while using up to 706$\times$ fewer parameters. Our findings reveal that generated passages are more effective than retrieved ones in attaining higher accuracy.

Citations (7)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Tweets

https://twitter.com/Moi39017963/status/1765268595045462500

https://twitter.com/dippatel1994/status/1765001584319074704

https://twitter.com/_reachsumit/status/1765263517526188233

To Generate or to Retrieve? On the Effectiveness of Artificial Contexts for Medical Open-Domain Question Answering (2403.01924v2)

Summary

Related Papers

Tweets