EX-FEVER: A Dataset for Multi-hop Explainable Fact Verification (2310.09754v3)
Abstract: Fact verification aims to automatically probe the veracity of a claim based on several pieces of evidence. Existing works are always engaging in accuracy improvement, let alone explainability, a critical capability of fact verification systems. Constructing an explainable fact verification system in a complex multi-hop scenario is consistently impeded by the absence of a relevant, high-quality dataset. Previous datasets either suffer from excessive simplification or fail to incorporate essential considerations for explainability. To address this, we present EXFEVER, a pioneering dataset for multi-hop explainable fact verification. With over 60,000 claims involving 2-hop and 3-hop reasoning, each is created by summarizing and modifying information from hyperlinked Wikipedia documents. Each instance is accompanied by a veracity label and an explanation that outlines the reasoning path supporting the veracity classification. Additionally, we demonstrate a novel baseline system on our EX-FEVER dataset, showcasing document retrieval, explanation generation, and claim verification, and validate the significance of our dataset. Furthermore, we highlight the potential of utilizing LLMs in the fact verification task. We hope our dataset could make a significant contribution by providing ample opportunities to explore the integration of natural language explanations in the domain of fact verification.
- Where is your evidence: Improving fact-checking by justification modeling. In Proceedings of the First Workshop on Fact Extraction and VERification (FEVER), pages 85–90.
- Generating Fact Checking Explanations. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7352–7364, Online. Association for Computational Linguistics.
- Language models are few-shot learners.
- e-snli: Natural language inference with natural language explanations. Advances in Neural Information Processing Systems, 31.
- Reading Wikipedia to answer open-domain questions. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1870–1879, Vancouver, Canada. Association for Computational Linguistics.
- Scaling instruction-finetuned language models. CoRR, abs/2210.11416.
- Fool me twice: Entailment from Wikipedia gamification. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 352–365.
- Summarize-then-answer: Generating concise explanations for multi-hop reading comprehension. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 6064–6080.
- Exploring listwise evidence reasoning with t5 for fact verification. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 402–410.
- Yichen Jiang and Mohit Bansal. 2019. Avoiding reasoning shortcuts: Adversarial evaluation, training, and model development for multi-hop qa. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 2726–2736.
- Hover: A dataset for many-hop fact extraction and claim verification. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 3441–3460.
- Generating fluent fact checking explanations with unsupervised post-editing. Inf., 13(10):500.
- Jacob Devlin Ming-Wei Chang Kenton and Lee Kristina Toutanova. 2019. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of NAACL-HLT, pages 4171–4186.
- Neema Kotonya and Francesca Toni. 2020. Explainable automated fact-checking for public health claims. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 7740–7754.
- A multi-level attention model for evidence-based fact checking. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 2447–2460.
- The science of fake news. Science, 359(6380):1094–1096.
- BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7871–7880.
- Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.
- Fine-grained fact verification with kernel graph attention network. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7342–7351, Online. Association for Computational Linguistics.
- QCRI’s COVID-19 Disinformation Detector: A System to Fight the COVID-19 Infodemic in Social Media. arXiv:2204.03506 [cs].
- Multi-hop fact checking of political claims. In IJCAI, pages 3892–3898. ijcai.org.
- Training language models to follow instructions with human feedback.
- Fact-checking complex claims with program-guided reasoning.
- Improving language understanding by generative pre-training.
- Language models are unsupervised multitask learners. OpenAI blog, 1(8):9.
- "why should I trust you?": Explaining the predictions of any classifier. In KDD, pages 1135–1144. ACM.
- Towards debiasing fact verification models. In EMNLP.
- Gautam Kishore Shahi and Durgesh Nandini. 2020. Fakecovid- A multilingual cross-domain fact check news dataset for COVID-19. In ICWSM Workshops.
- Sofia University “St. Kliment Ohridski”, Bulgaria, Pepa Gencheva, Preslav Nakov, Qatar Computing Research Institute, HBKU, Qatar, Lluís Màrquez, Qatar Computing Research Institute, HBKU, Qatar, Alberto Barrón-Cedeño, Qatar Computing Research Institute, HBKU, Qatar, Ivan Koychev, and Sofia University “St. Kliment Ohridski”, Bulgaria. 2017. A Context-Aware Approach for Detecting Worth-Checking Claims in Political Debates. In RANLP 2017 - Recent Advances in Natural Language Processing Meet Deep Learning, pages 267–276. Incoma Ltd. Shoumen, Bulgaria.
- Dominik Stammbach and Elliott Ash. 2020. e-fever: Explanations and summaries for automated fact checking. Proceedings of the 2020 Truth and Trust Online (TTO 2020), pages 32–43.
- FEVER: a large-scale dataset for fact extraction and verification. In NAACL-HLT, pages 809–819. Association for Computational Linguistics.
- Evaluating adversarial attacks against multiple fact verification systems. In EMNLP.
- Fact or fiction: Verifying scientific claims. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 7534–7550, Online. Association for Computational Linguistics.
- William Yang Wang. 2017. “Liar, Liar Pants on Fire”: A New Benchmark Dataset for Fake News Detection. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 422–426, Vancouver, Canada. Association for Computational Linguistics.
- Chain-of-thought prompting elicits reasoning in large language models. In Advances in Neural Information Processing Systems, volume 35, pages 24824–24837. Curran Associates, Inc.
- Menatqa: A new dataset for testing the temporal comprehension and reasoning abilities of large language models.
- Answering complex open-domain questions with multi-hop dense retrieval. In ICLR. OpenReview.net.
- Hotpotqa: A dataset for diverse, explainable multi-hop question answering. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2369–2380.
- Reasoning over semantic-level graph for fact checking. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 6170–6180.
- Gear: Graph-based evidence aggregating and reasoning for fact verification. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 892–901.