EXPLAIN, EDIT, GENERATE: Rationale-Sensitive Counterfactual Data Augmentation for Multi-hop Fact Verification (2310.14508v1)

Published 23 Oct 2023 in cs.CL

Abstract: Automatic multi-hop fact verification task has gained significant attention in recent years. Despite impressive results, these well-designed models perform poorly on out-of-domain data. One possible solution is to augment the training data with counterfactuals, which are generated by minimally altering the causal features of the original data. However, current counterfactual data augmentation techniques fail to handle multi-hop fact verification due to their incapability to preserve the complex logical relationships within multiple correlated texts. In this paper, we overcome this limitation by developing a rationale-sensitive method to generate linguistically diverse and label-flipping counterfactuals while preserving logical relationships. In specific, the diverse and fluent counterfactuals are generated via an Explain-Edit-Generate architecture. Moreover, the checking and filtering modules are proposed to regularize the counterfactual data with logical relations and flipped labels. Experimental results show that the proposed approach outperforms the SOTA baselines and can generate linguistically diverse counterfactual data without disrupting their logical relationships.

Citations (5)

View on Semantic Scholar

Summary

The paper introduces RACE, a novel rationale-sensitive counterfactual data augmentation method that generates label-flipping examples while maintaining multi-hop logical structure.
It details an Explain-Edit-Generate architecture that extracts rationales, applies entity-based editing, and uses constrained beam search for claim synthesis.
Experimental results show that RACE significantly improves model performance and generalization on out-of-domain multi-hop fact verification tasks.

Rationale-Sensitive Counterfactual Data Augmentation for Multi-hop Fact Verification

Overview

The task of multi-hop fact verification necessitates discerning the veracity of a claim based on evidence that spans multiple documents. Despite considerable progress in this area, models often struggle with out-of-domain (OOD) data, primarily due to their over-reliance on spurious correlations. A promising approach to mitigate this issue is Counterfactual Data Augmentation (CDA), which involves augmenting training data with instances generated by minimally altering causal features of the original data. However, existing CDA techniques fall short in addressing the complexities of multi-hop fact verification, mainly due to their inability to preserve intricate logical relationships among multiple texts. This paper introduces a novel approach, termed RACE (RAtionale-sensitive Counterfactual gEneration), which effectively generates label-flipping, linguistically diverse counterfactuals without compromising logical coherence.

Methodology

RACE adopts an Explain-Edit-Generate architecture to generate counterfactuals, following these key stages:

Rationale Extraction: The approach begins by extracting rationales from multi-hop evidence using an explainability method. These rationales capture both the logical correlation within the evidence and the factual relationship between the claim and the evidence.
Evidence Editing: The evidence is then edited based on identified causative entities within the rationales through a set of entity-based rules. This editing aims to create evidence that is factually distinct from the original while preserving the logical structure necessary for multi-hop reasoning.
Claim Generation: Leveraging a pre-trained generation model, RACE synthesizes new claims from the edited evidence. Constrained beam search decoding, guided by entities within the rationales, ensures that the generated claims are both linguistically diverse and logically consistent with the edited evidence.
Filtering: Finally, a filtering stage refines the generated claims based on semantic and topic fidelity, ensuring minimal perturbation from the original claims while achieving label flipping.

Experimental evaluations demonstrate RACE's superiority over state-of-the-art baselines in generating high-quality counterfactuals, leading to notable improvements in model performance across a range of datasets.

Implications and Future Directions

The development of RACE marks a significant advancement in data augmentation for multi-hop fact verification. By preserving logical relationships through carefully designed evidence editing and rationale-informed claim generation, RACE addresses a critical gap in existing CDA techniques. This approach not only enhances model robustness against spurious correlations but also significantly improves OOD generalization.

Looking ahead, the effectiveness of RACE opens up several avenues for future research in AI and natural language processing:

Generalization to Other Domains: While this paper focuses on multi-hop fact verification, the underlying principles of RACE could be adapted for other complex reasoning tasks requiring the preservation of logical structure in augmented data.
Integration with Larger LLMs: As LLMs continue to grow in size and capability, integrating RACE with these models could further enhance the quality and diversity of generated counterfactuals.
Investigation into Other Forms of Logical Reasoning: Exploring RACE's applicability to tasks requiring different forms of logical reasoning (e.g., causal inference, abductive reasoning) could further expand our understanding of effective data augmentation strategies.

In conclusion, RACE represents a methodological breakthrough in counterfactual data augmentation for multi-hop fact verification, promising not only immediate improvements in model resilience but also laying the groundwork for future innovations in AI research.

PDF Markdown