Variational Open-Domain Question Answering (2210.06345v2)

Published 23 Sep 2022 in cs.CL, cs.IR, and cs.LG

Abstract: Retrieval-augmented models have proven to be effective in natural language processing tasks, yet there remains a lack of research on their optimization using variational inference. We introduce the Variational Open-Domain (VOD) framework for end-to-end training and evaluation of retrieval-augmented models, focusing on open-domain question answering and LLMling. The VOD objective, a self-normalized estimate of the R\'enyi variational bound, approximates the task marginal likelihood and is evaluated under samples drawn from an auxiliary sampling distribution (cached retriever and/or approximate posterior). It remains tractable, even for retriever distributions defined on large corpora. We demonstrate VOD's versatility by training reader-retriever BERT-sized models on multiple-choice medical exam questions. On the MedMCQA dataset, we outperform the domain-tuned Med-PaLM by +5.3% despite using 2.500$\times$ fewer parameters. Our retrieval-augmented BioLinkBERT model scored 62.9% on the MedMCQA and 55.0% on the MedQA-USMLE. Last, we show the effectiveness of our learned retriever component in the context of medical semantic search.

Citations (8)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

GitHub

GitHub - VodLM/vod: End-to-end training of Retrieval-Augmented LMs (REALM, RAG) (22 stars)

Tweets

https://twitter.com/valentinlievin/status/1785778490748313996

https://twitter.com/leafs_s_jp/status/1789859133551710470

Variational Open-Domain Question Answering (2210.06345v2)

Summary

Related Papers

GitHub

Tweets