Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Improving Span-based Question Answering Systems with Coarsely Labeled Data (1811.02076v1)

Published 5 Nov 2018 in cs.CL

Abstract: We study approaches to improve fine-grained short answer Question Answering models by integrating coarse-grained data annotated for paragraph-level relevance and show that coarsely annotated data can bring significant performance gains. Experiments demonstrate that the standard multi-task learning approach of sharing representations is not the most effective way to leverage coarse-grained annotations. Instead, we can explicitly model the latent fine-grained short answer variables and optimize the marginal log-likelihood directly or use a newly proposed \emph{posterior distillation} learning objective. Since these latent-variable methods have explicit access to the relationship between the fine and coarse tasks, they result in significantly larger improvements from coarse supervision.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Hao Cheng (190 papers)
  2. Ming-Wei Chang (44 papers)
  3. Kenton Lee (40 papers)
  4. Ankur Parikh (9 papers)
  5. Michael Collins (46 papers)
  6. Kristina Toutanova (31 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.