A large annotated corpus for learning natural language inference (1508.05326v1)

Published 21 Aug 2015 in cs.CL

Abstract: Understanding entailment and contradiction is fundamental to understanding natural language, and inference about entailment and contradiction is a valuable testing ground for the development of semantic representations. However, machine learning research in this area has been dramatically limited by the lack of large-scale resources. To address this, we introduce the Stanford Natural Language Inference corpus, a new, freely available collection of labeled sentence pairs, written by humans doing a novel grounded task based on image captioning. At 570K pairs, it is two orders of magnitude larger than all other resources of its type. This increase in scale allows lexicalized classifiers to outperform some sophisticated existing entailment models, and it allows a neural network-based model to perform competitively on natural language inference benchmarks for the first time.

Citations (4,121)

View on Semantic Scholar

Summary

The paper presents the SNLI corpus—570K human-annotated sentence pairs that significantly enhance training and evaluation of NLI models.
It details a rigorous crowdsourcing and validation methodology that achieved 98% three-annotator consensus for annotation reliability.
Models trained on SNLI, including an LSTM, reached up to 80.8% accuracy in transfer learning tasks, demonstrating the corpus’s effectiveness.

An Annotated Corpus for Learning Natural Language Inference

In the paper "A large annotated corpus for learning natural language inference," Bowman et al. introduce the Stanford Natural Language Inference (SNLI) corpus. This resource comprises 570,152 labeled sentence pairs designed to enhance research in natural language inference (NLI). The SNLI corpus stands apart from previous resources due to its substantial scale—two orders of magnitude larger than any preceding corpora—and its high-quality, human-generated sentence pairs that facilitate robust computational models for semantic representation.

Introduction and Motivation

The exploration of entailment and contradiction is fundamental to natural language understanding (NLU). Characterizing these relations computationally forms a cornerstone for numerous applications, including semantic parsing, information retrieval, and commonsense reasoning. Historically, studies have harnessed symbolic logic, knowledge bases, and neural networks to address NLI. Nevertheless, the progress has been hindered by the inadequacies of existing corpora, which are either too limited in size, algorithmically generated, or marred by indeterminate annotations.

Corpus Construction

Bowman et al. addressed these limitations by developing SNLI with clear goals: size, quality, and resolution of indeterminacy. The dataset's sentences were crowdsourced using Amazon Mechanical Turk, where contributors wrote premise and alternate hypothesis sentences under specific instructions to ensure relevance and consistency. The resulting pairs were labeled as entailment, contradiction, or neutral.

Responding to the challenges posed by smaller datasets like the Recognizing Textual Entailment (RTE) challenge tasks, SNLI's scale facilitates precise training of data-intensive, parameter-rich models such as neural networks. The collected pairs underwent a secondary validation phase to assess annotation reliability. This process achieved a 98% three-annotator consensus and 58% unanimous agreement in a subset of the data, underscoring the corpus's reliability for NLI tasks.

Model Evaluation

The paper evaluates several NLI models using the SNLI corpus:

Excitement Open Platform Models: These include a basic edit-distance model and a classifier-based model enhanced with lexical resources (WordNet, VerbOcean). The classifier with lexical resources outperformed others, achieving 75% accuracy on SNLI.
Lexicalized Classifier: This model relies on cross-bigram features and lexical overlap, yielding up to 78.2% accuracy. Ablation studies show significant performance drops without lexicalized features, highlighting their utility in large datasets.
Neural Network Models: Evaluations include a baseline sum-of-words model, a recurrent neural network (RNN), and a Long Short-Term Memory (LSTM) RNN. The LSTM achieved comparable performance to the lexicalized classifier with a test accuracy of 77.6%.

Transfer Learning

The authors further demonstrate the potential of transfer learning by initializing an LSTM pretrained on SNLI for the SICK entailment task. This pretrained model achieved 80.8% accuracy on SICK, outperforming standard models and approaching state-of-the-art results. This indicates that SNLI-trained models encapsulate substantial domain-general semantic knowledge, applicable beyond the corpus’s original scope.

Implications and Future Work

The introduction of SNLI has significant theoretical and practical implications. By providing a large-scale, high-fidelity dataset, this corpus allows for the development and evaluation of sophisticated NLI models. The empirical results underscore the efficiency of neural networks in learning robust semantic representations, which can significantly advance NLU.

Future trajectories could explore broader applications of these models in different NLU domains, incorporating more sophisticated mechanisms for semantic and syntactic representation. Furthermore, extending the corpus to incorporate additional languages or domains could enhance its universality and applicability.

In conclusion, the SNLI corpus presents a substantial leap forward in natural language inference research, enabling the training of advanced computational models that promise to enrich our understanding and processing of natural language semantics.