Self-Teaching Machines to Read and Comprehend with Large-Scale Multi-Subject Question-Answering Data
(2102.01226)Abstract
In spite of much recent research in the area, it is still unclear whether subject-area question-answering data is useful for machine reading comprehension (MRC) tasks. In this paper, we investigate this question. We collect a large-scale multi-subject multiple-choice question-answering dataset, ExamQA, and use incomplete and noisy snippets returned by a web search engine as the relevant context for each question-answering instance to convert it into a weakly-labeled MRC instance. We then propose a self-teaching paradigm to better use the generated weakly-labeled MRC instances to improve a target MRC task. Experimental results show that we can obtain +5.1% in accuracy on a multiple-choice MRC dataset, C3, and +3.8% in exact match on an extractive MRC dataset, CMRC 2018 over state-of-the-art MRC baselines, demonstrating the effectiveness of our framework and the usefulness of large-scale subject-area question-answering data for different types of machine reading comprehension tasks.
We're not able to analyze this paper right now due to high demand.
Please check back later (sorry!).
Generate a summary of this paper on our Pro plan:
We ran into a problem analyzing this paper.