Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 170 tok/s
Gemini 2.5 Pro 48 tok/s Pro
GPT-5 Medium 37 tok/s Pro
GPT-5 High 39 tok/s Pro
GPT-4o 130 tok/s Pro
Kimi K2 187 tok/s Pro
GPT OSS 120B 445 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

WiCE: Real-World Entailment for Claims in Wikipedia (2303.01432v2)

Published 2 Mar 2023 in cs.CL

Abstract: Textual entailment models are increasingly applied in settings like fact-checking, presupposition verification in question answering, or summary evaluation. However, these represent a significant domain shift from existing entailment datasets, and models underperform as a result. We propose WiCE, a new fine-grained textual entailment dataset built on natural claim and evidence pairs extracted from Wikipedia. In addition to standard claim-level entailment, WiCE provides entailment judgments over sub-sentence units of the claim, and a minimal subset of evidence sentences that support each subclaim. To support this, we propose an automatic claim decomposition strategy using GPT-3.5 which we show is also effective at improving entailment models' performance on multiple datasets at test time. Finally, we show that real claims in our dataset involve challenging verification and retrieval problems that existing models fail to address.

Citations (71)

Summary

  • The paper presents WICE as a novel dataset for fine-grained entailment evaluation using Wikipedia claims and citations.
  • It introduces Claim-Split, which leverages GPT-3.5 to break complex claims into manageable subclaims for precise annotation.
  • Analysis shows that context-aware models perform better, yet current systems still lag behind human-level verification.

The field of NLP often requires models to verify the truthfulness of statements based on provided evidence, which can have applications ranging from fact-checking to document summarization. A new dataset, named W ICE (Wikipedia Citation Entailment), intends to tackle these challenges by offering a more realistic and fine-grained textual entailment setup.

This dataset is rooted in Wikipedia, where claims within articles are automatically identified and linked with the articles they cite as evidence. W ICE not only assesses whether a claim is supported, partially supported, or unsupported by the evidence but also provides detailed annotations for sub-sentence units within the claims, showing exactly which parts are supported by the evidence and which are not.

One notable innovation introduced alongside W ICE is an automatic claim decomposition strategy known as Claim-Split. Utilizing GPT-3.5, it breaks complex claims into more manageable subclaims, making the annotation process more efficient and possibly improving the performance of entailment models, as subclaims can be easier to evaluate than longer, more intricate statements.

W ICE is shown to pose new challenges for current entailment models that generally deal with shorter texts. Existing models, when assessed on real-world claims from the dataset, underperform due to the complex nature of evidence verification and retrieval issues that these models are not yet equipped to handle.

The importance of context and retrieval is underscored in the data analysis. Models trained to predict entailment using chunks of the evidence, combined with context, achieve better performance than those relying solely on individual sentences. However, these systems still fall short of human-level performance.

In summary, W ICE represents a step forward in the realistic assessment of models' capability to determine the factual correctness of real-world claims. Its supporting tools, like Claim-Split and fine-grained annotations, provide ways to both enhance the dataset and potentially improve model performance, emphasizing the importance of context, retrieval, and the granularity of evidence in the continuous evolution of automated fact verification systems.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 1 tweet and received 21 likes.

Upgrade to Pro to view all of the tweets about this paper:

Youtube Logo Streamline Icon: https://streamlinehq.com