Papers
Topics
Authors
Recent
2000 character limit reached

Self-Teaching Machines to Read and Comprehend with Large-Scale Multi-Subject Question-Answering Data (2102.01226v2)

Published 1 Feb 2021 in cs.CL

Abstract: In spite of much recent research in the area, it is still unclear whether subject-area question-answering data is useful for machine reading comprehension (MRC) tasks. In this paper, we investigate this question. We collect a large-scale multi-subject multiple-choice question-answering dataset, ExamQA, and use incomplete and noisy snippets returned by a web search engine as the relevant context for each question-answering instance to convert it into a weakly-labeled MRC instance. We then propose a self-teaching paradigm to better use the generated weakly-labeled MRC instances to improve a target MRC task. Experimental results show that we can obtain +5.1% in accuracy on a multiple-choice MRC dataset, C3, and +3.8% in exact match on an extractive MRC dataset, CMRC 2018 over state-of-the-art MRC baselines, demonstrating the effectiveness of our framework and the usefulness of large-scale subject-area question-answering data for different types of machine reading comprehension tasks.

Citations (5)

Summary

We haven't generated a summary for this paper yet.

Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.