SummaC: Re-Visiting NLI-based Models for Inconsistency Detection in Summarization

Published 18 Nov 2021 in cs.CL | (2111.09525v1)

Abstract: In the summarization domain, a key requirement for summaries is to be factually consistent with the input document. Previous work has found that natural language inference (NLI) models do not perform competitively when applied to inconsistency detection. In this work, we revisit the use of NLI for inconsistency detection, finding that past work suffered from a mismatch in input granularity between NLI datasets (sentence-level), and inconsistency detection (document level). We provide a highly effective and light-weight method called SummaCConv that enables NLI models to be successfully used for this task by segmenting documents into sentence units and aggregating scores between pairs of sentences. On our newly introduced benchmark called SummaC (Summary Consistency) consisting of six large inconsistency detection datasets, SummaCConv obtains state-of-the-art results with a balanced accuracy of 74.4%, a 5% point improvement compared to prior work. We make the models and datasets available: https://github.com/tingofurro/summac

Abstract PDF Upgrade to Chat

Authors (4)

Citations (333)

View on Semantic Scholar

Summary

The paper presents a novel segmentation method that aligns NLI models with sentence-level processing for effective inconsistency detection.
It introduces two model variants—zero-shot aggregation and a convolutional approach—achieving up to 74.4% balanced accuracy on a comprehensive benchmark.
The study demonstrates high processing efficiency at over 430 documents per minute and suggests future extensions to multi-hop reasoning and ensemble methods.

An Analysis of SummaC: Re-Visiting NLI-based Models for Inconsistency Detection in Summarization

The paper "SummaC: Re-Visiting NLI-based Models for Inconsistency Detection in Summarization" offers a comprehensive examination of the potential of Natural Language Inference (NLI) models for identifying inconsistencies in text summarization. The impetus for this work arises from the fundamental need for summaries to faithfully represent input documents, a requirement often unmet by current summarization models due to various inconsistency types such as inversion or hallucination.

Technical Approach and Novel Contributions

The core innovation presented by the authors is a method to overcome the challenges faced by prior attempts at leveraging NLI models for inconsistency detection, specifically the disparity in granularity at which these models operate versus the granularity of inconsistency detection. The methodology pioneered in this study involves segmenting documents into sentence-level units, enabling the effective application of NLI models by aligning their sentence-level processing with the needs of document-level inconsistency detection.

Two variants of the model are introduced: a zero-shot aggregation model ({}) and a convolutional model ({}). The former employs max and mean operations on sentence-pair entailment scores to derive consistency scores from NLI models without additional training. The latter, however, employs a convolutional neural network to aggregate these scores, trained on a synthetic dataset to optimize consistency detection.

The efficacy of these models is demonstrated on the SummaC Benchmark, a newly compiled dataset amalgamating six diverse inconsistency detection datasets. The benchmark ensures a comprehensive evaluation, showcasing the variant that achieves a balanced accuracy of 74.4%, a notable enhancement over existing models.

Results and Evaluation

The paper reports significant improvements over previous inconsistency detection methods, with notable advances across multiple datasets. The convolutional model (SummaC❲) outperforms alternative approaches, including the parsing-based DAE and QAG-based QuestEval, particularly in datasets with high inconsistency prevalence.

This work's significance lies not only in quantitative performance metrics but also in its methodological rigor. By circumventing previous limitations of NLI models with granular sample processing, the authors demonstrate a robust method for integrating sophisticated NLI capabilities into practical summarization consistency validation tasks.

Rapid throughput is achieved with these novel models, processing upwards of 430 documents per minute, indicating viability for large-scale applications. Furthermore, the comparative analysis of different NLI architectures and datasets provides insight into choosing optimal configurations for both entailment and broader sequence models in similar tasks.

Implications and Future Directions

This research underscores the potential of NLI models beyond their original scope, aligning with broader trends in NLP wherein cross-task applications yield enhanced utility. The contribution lies in reframing how entailment can be interpreted and leveraged in summarization, heralding improvements in practical application and theoretical understanding of document-level to sentence-level mappings in AI systems.

Future work could explore refining these approaches, such as integrating multi-hop reasoning to further resolve complex inconsistencies or leveraging multi-model ensemble techniques for enhanced score aggregation. Beyond summarization, adapting these methods to other NLP tasks, such as text simplification or context-aware translation, presents a promising avenue of research.

In summary, the paper presents a significant advancement in leveraging NLI models for summarization inconsistency detection, contributing valuable methodologies and insights to the field, with potential applications extending well beyond the task at hand.

Markdown Report Issue