Iterative Mask Filling: An Effective Text Augmentation Method Using Masked Language Modeling (2401.01830v1)
Abstract: Data augmentation is an effective technique for improving the performance of machine learning models. However, it has not been explored as extensively in NLP as it has in computer vision. In this paper, we propose a novel text augmentation method that leverages the Fill-Mask feature of the transformer-based BERT model. Our method involves iteratively masking words in a sentence and replacing them with LLM predictions. We have tested our proposed method on various NLP tasks and found it to be effective in many cases. Our results are presented along with a comparison to existing augmentation methods. Experimental results show that our proposed method significantly improves performance, especially on topic classification datasets.
- sentence-transformers/all-minilm-l6-v2 · hugging face (2022), https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2
- Twitter sentiment analysis | kaggle (2022), https://www.kaggle.com/datasets/jp797498e/twitter-entity-sentiment-analysis
- News category dataset | kaggle (2023), https://www.kaggle.com/datasets/setseries/news-category-dataset
- Coulombe, C.: Text data augmentation made simple by leveraging nlp cloud apis. arXiv preprint arXiv:1812.04718 (2018)
- Misra, R.: News category dataset (06 2018). https://doi.org/10.13140/RG.2.2.20331.18729
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.