Augmenting Data with Mixup for Sentence Classification: An Empirical Study

Published 22 May 2019 in cs.CL and cs.AI | (1905.08941v1)

Abstract: Mixup, a recent proposed data augmentation method through linearly interpolating inputs and modeling targets of random samples, has demonstrated its capability of significantly improving the predictive accuracy of the state-of-the-art networks for image classification. However, how this technique can be applied to and what is its effectiveness on NLP tasks have not been investigated. In this paper, we propose two strategies for the adaption of Mixup on sentence classification: one performs interpolation on word embeddings and another on sentence embeddings. We conduct experiments to evaluate our methods using several benchmark datasets. Our studies show that such interpolation strategies serve as an effective, domain independent data augmentation approach for sentence classification, and can result in significant accuracy improvement for both CNN and LSTM models.

Abstract PDF Upgrade to Chat

Authors (3)

Citations (224)

View on Semantic Scholar

Summary

The paper introduces two Mixup variants—wordMixup and senMixup—that generate synthetic data through linear interpolation of embeddings.
Experimental results on five benchmark datasets show notable accuracy improvements, with CNN models exceeding baselines by over 3.3% and LSTM by more than 5%.
The augmentation approach acts as an effective regularizer, reducing overfitting and enhancing model robustness in sentence classification tasks.

Augmenting Data with Mixup for Sentence Classification: An Empirical Study

This paper examines the application of Mixup, a data augmentation method originally shown to enhance image classification models significantly, to the domain of NLP, specifically for sentence classification tasks. It describes two strategies for adapting Mixup to NLP tasks: performing interpolation on word embeddings and sentence embeddings. Through a series of experiments on diverse benchmark datasets, the authors demonstrate that these interpolation strategies can effectively serve as model regularizers and enhance predictive accuracy for CNN and LSTM models.

Methodology

The Mixup technique, traditionally applied in the image domain, involves generating synthetic training data through linear interpolation of two random sample pairs, along with their associated targets. Inspired by its success in computer vision, the authors adapt this concept to sentence classification by proposing two variants: wordMixup and senMixup.

WordMixup: This variant performs interpolation directly in the word embedding space. By linearly interpolating word embeddings from two different sentences, new synthetic examples are generated, which are then used for training purposes.
SenMixup: Here, interpolation occurs at the sentence embedding level. This is accomplished after passing sentences through an encoder such as a CNN or LSTM, thus using the sentence representations produced for the interpolation process.

Experimental Results

The empirical study utilizes five benchmark datasets—TREC, MR, SST-1, SST-2, and Subj—to validate the proposed methods. The results are promising, illustrating that both wordMixup and senMixup improve model performance across a variety of experimental conditions. Notably, these techniques showed significant accuracy improvements, particularly in datasets with multiclass categorization, such as TREC and SST-1.

CNN Models: WordMixup and senMixup showed considerable improvements in predictive accuracy with CNNs, with the most notable improvements on the SST-1 and MR datasets, exceeding baseline performance by more than 3.3%.
LSTM Models: These techniques also benefited LSTM models, particularly enhancing performance on datasets with more target classes, reflecting an increase of over 5% in certain instances.
Regularization Effects: Both Mixup variants exhibited strong regularization capabilities, maintaining higher training loss levels that appeared crucial in providing continuous training signals and thus preventing overfitting.

Implications and Future Directions

The implications of applying Mixup to NLP through these variants are noteworthy. The use of Mixup for data augmentation represents a domain-independent, computationally low-cost strategy that does not rely on manual interventions, such as manual data transformations or label-preserving transformations common in traditional NLP augmentation methods.

Future work could explore extended implementations of Mixup, such as Manifold Mixup and AdaMixup, which have exhibited potential in addressing manifold intrusion issues and other challenges in Mixup applications. Further inquiry into the semantics and characteristics of the interpolated sentences and why these interpolations prove effective for sentence classification is also desirable.

In summary, the application of Mixup to NLP represents an effective augmentation strategy that enhances model robustness, suggesting a promising direction for future research in data-efficient deep learning strategies in NLP.

Markdown Report Issue