AVeriTeC: A Dataset for Real-world Claim Verification with Evidence from the Web

Published 22 May 2023 in cs.CL | (2305.13117v3)

Abstract: Existing datasets for automated fact-checking have substantial limitations, such as relying on artificial claims, lacking annotations for evidence and intermediate reasoning, or including evidence published after the claim. In this paper we introduce AVeriTeC, a new dataset of 4,568 real-world claims covering fact-checks by 50 different organizations. Each claim is annotated with question-answer pairs supported by evidence available online, as well as textual justifications explaining how the evidence combines to produce a verdict. Through a multi-round annotation process, we avoid common pitfalls including context dependence, evidence insufficiency, and temporal leakage, and reach a substantial inter-annotator agreement of $\kappa=0.619$ on verdicts. We develop a baseline as well as an evaluation scheme for verifying claims through several question-answering steps against the open web.

Abstract PDF Upgrade to Chat

Authors (3)

Citations (48)

View on Semantic Scholar

Summary

The paper introduces AVeriTeC, a comprehensive dataset featuring 4,568 real-world claims with evidence-based annotations for enhanced claim verification.
It overcomes limitations in existing fact-checking resources by incorporating pre-claim evidence and detailed question-answer pairs that mimic human verification.
Empirical evaluations using the Hungarian Algorithm and state-of-the-art models highlight the dataset’s potential to advance automated fact-checking research.

AVeriTeC: A Dataset for Real-world Claim Verification with Evidence from the Web

The paper "AVeriTeC: A Dataset for Real-world Claim Verification with Evidence from the Web" presents an innovative dataset specifically crafted to overcome existing limitations in automated fact-checking (AFC) resources. The dataset, titled AVeriTeC, comprises 4,568 real-world claims sourced from 50 different fact-checking organizations, each annotated with supportive question-answer pairs and textual justifications, ensuring evidence sufficiency and temporal consistency.

Dataset Construction

The authors of the paper address several inherent issues within existing fact-checking datasets, including reliance on synthesized claims, lack of comprehensive evidence annotations, and temporal leaks where evidence postdates the claim. To this end, AVeriTeC provides real-world claims with verified evidence retrieved from the web prior to the claim being made. The introduction of question-answer pairs is a notable feature, as these are meticulously formulated to unfold complex claims into manageable subtasks, aligning with human fact-checking approaches. Furthermore, a novel class label called "conflicting evidence/cherry-picking" is added to account for situations where partial truths are misleading.

The dataset curation involves a multi-step annotation process conducted by a team of annotators who ensure claims are independently meaningful and fact-checkable without supplementary context. These annotators employ several quality control measures, including a "blind" quality control phase that replicates annotations to verify evidence sufficiency.

Strong Numerical Results and Methodologies

A significant numerical highlight of the paper is the substantial free-marginal $\kappa = 0.619$ , indicating high inter-annotator agreement. This metric underscores the reliability of the dataset annotations. The methodological rigor involved in constructing AVeriTeC is evident from the multi-phase annotation process reminiscent of systematic human fact-checking procedures. Additionally, the templated framework adopted surpasses mere veracity classification by embedding justifications that delineate the logical chains derived from the annotated evidence, an effort rarely encapsulated in other datasets.

Baseline Evaluation and Implications

To appropriately evaluate models on AVeriTeC, the authors employ an optimized matching technique using the Hungarian Algorithm in conjunction with the METEOR metric, thereby addressing the open-retrieval nature of claim verification. Their baseline includes a pipeline comprising search, question generation, evidence selection, and veracity prediction with models such as BERT and BART. Despite certain retrieval challenges, their empirical analysis through this baseline yields insights into the essential components of claim verification and underscores the complexity of veracity prediction.

Future Directions in AI

The release of AVeriTeC heralds potential explorations in evidence retrieval strategies and multi-hop reasoning frameworks within the AI community. The structured presentation of claims in AVeriTeC can significantly enhance the training datasets available for developing more intuitive and context-aware AFC systems. From a theoretical perspective, the dataset encourages the interrogation of large pre-trained models' capacity to generate reasoned justifications rather than mere predictions.

Conclusion

AVeriTeC emerges as a strategic resource aimed at bridging the gap between artificial and real-world challenge scenarios in AFC. It propels fact-checking research by addressing longstanding issues of context dependence and temporal leakage. Thus, AVeriTeC establishes a foundational basis for evolving sophisticated verification algorithms that are indispensable in the era of pervasive misinformation. The dataset's balanced methodological construction and innovative integration of evidence-based arguments place it at the forefront of factual AI interventions aimed at bolstering journalistic integrity and public trust.

Markdown Report Issue