Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 134 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 30 tok/s Pro
GPT-5 High 37 tok/s Pro
GPT-4o 98 tok/s Pro
Kimi K2 195 tok/s Pro
GPT OSS 120B 442 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

Document-level Claim Extraction and Decontextualisation for Fact-Checking (2406.03239v2)

Published 5 Jun 2024 in cs.CL

Abstract: Selecting which claims to check is a time-consuming task for human fact-checkers, especially from documents consisting of multiple sentences and containing multiple claims. However, existing claim extraction approaches focus more on identifying and extracting claims from individual sentences, e.g., identifying whether a sentence contains a claim or the exact boundaries of the claim within a sentence. In this paper, we propose a method for document-level claim extraction for fact-checking, which aims to extract check-worthy claims from documents and decontextualise them so that they can be understood out of context. Specifically, we first recast claim extraction as extractive summarization in order to identify central sentences from documents, then rewrite them to include necessary context from the originating document through sentence decontextualisation. Evaluation with both automatic metrics and a fact-checking professional shows that our method is able to extract check-worthy claims from documents more accurately than previous work, while also improving evidence retrieval.

Summary

  • The paper proposes a novel document-level claim extraction approach that integrates summarization and decontextualization for independent claim evaluation.
  • It employs BertSum and DocNLI models alongside a QA-driven context generation process to refine and prioritize check-worthy claims.
  • Experiments on the AVeriTeC-DCE dataset demonstrate significant precision improvements, indicating enhanced performance in automated fact-checking.

Document-level Claim Extraction and Decontextualisation for Fact-Checking

Introduction

The paper "Document-level Claim Extraction and Decontextualisation for Fact-Checking" addresses the challenge of efficiently selecting claims from documents that fact-checkers need to verify. Traditional claim extraction methodologies predominantly operate at the sentence level, identifying if a sentence contains a claim worth verifying. However, documents present a more complicated scenario where multiple claims, potentially not all central to the document, exist. This work proposes a document-level approach, integrating extractive summarization with decontextualization to streamline the claim extraction process for fact-checking.

Methodology

The approach introduced in the paper encompasses several stages, each contributing to the extraction of salient claims from documents that are interpretable without their original context.

Sentence Extraction

The claim extraction process begins with identifying central sentences within a document using BertSum, an extractive summarization model. This model ranks sentences by their relevance to the document's main theme. Furthermore, to reduce redundancy and ensure diversity of claims, a model called DocNLI processes these ranked sentences to remove those with entailed repetitive content. Figure 1

Figure 1: An overview of the document-level claim extraction framework, illustrating the process from extractive summarization to claim check-worthiness classification.

Context Generation and Decontextualization

Once central sentences are determined, the system identifies ambiguous information units such as named entities or pronouns. It generates questions targeting these units and employs a QA model to retrieve context from the document itself. This QA-derived context aids in rewriting sentences to be understood independently of their source, using a seq2seq model for decontextualization, enriching the sentences with crucial contextual data. Figure 2

Figure 2: Case studies of sentence decontextualisation solving linguistic problems, such as coreference resolution, global scoping, and bridge anaphora.

Estimation of Claim Check-worthiness

The final decontextualized sentences undergo evaluation through a classifier that assigns a check-worthiness score, ensuring only substantial claims proceed to fact-checking. The classifier distinguishes between Check-worthy Factual Sentences, Unimportant Factual Sentences, and Non-Factual Sentences based on their likelihood to present verifiable claims.

Results and Evaluation

The framework is evaluated using a newly derived dataset, AVeriTeC-DCE. It improves the precision of identifying central sentences to 47.8% at Precision@1, a notable enhancement over previous systems like Claimbuster. In addition, the approach demonstrates robust performance in evidence retrieval, with precision gains indicating that decontextualized sentences significantly enhance claim validation efficacy.

Moreover, through both automatic and human evaluations, the effectiveness of the decontextualization process is affirmed. Human evaluators recognize the claims extracted by the proposed method as more central and check-worthy compared to traditional methods.

Conclusion

This document-level claim extraction method enriches the fact-checking pipeline by prioritizing essential claims and providing contextually independent formulations. The ability to parse documents holistically enhances the relevance and accuracy of automated fact-checking systems. Future work could explore extending this methodology to multimodal content, addressing platform-specific claim dynamics, and improving cross-domain generalization capabilities.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 1 tweet and received 10 likes.

Upgrade to Pro to view all of the tweets about this paper: