Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Adaptations of ROUGE and BLEU to Better Evaluate Machine Reading Comprehension Task (1806.03578v1)

Published 10 Jun 2018 in cs.CL

Abstract: Current evaluation metrics to question answering based machine reading comprehension (MRC) systems generally focus on the lexical overlap between the candidate and reference answers, such as ROUGE and BLEU. However, bias may appear when these metrics are used for specific question types, especially questions inquiring yes-no opinions and entity lists. In this paper, we make adaptations on the metrics to better correlate n-gram overlap with the human judgment for answers to these two question types. Statistical analysis proves the effectiveness of our approach. Our adaptations may provide positive guidance for the development of real-scene MRC systems.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. An Yang (32 papers)
  2. Kai Liu (391 papers)
  3. Jing Liu (526 papers)
  4. Yajuan Lyu (16 papers)
  5. Sujian Li (83 papers)
Citations (37)

Summary

We haven't generated a summary for this paper yet.