Word-level Textual Adversarial Attacking as Combinatorial Optimization (1910.12196v4)

Published 27 Oct 2019 in cs.CL, cs.AI, and cs.LG

Abstract: Adversarial attacks are carried out to reveal the vulnerability of deep neural networks. Textual adversarial attacking is challenging because text is discrete and a small perturbation can bring significant change to the original input. Word-level attacking, which can be regarded as a combinatorial optimization problem, is a well-studied class of textual attack methods. However, existing word-level attack models are far from perfect, largely because unsuitable search space reduction methods and inefficient optimization algorithms are employed. In this paper, we propose a novel attack model, which incorporates the sememe-based word substitution method and particle swarm optimization-based search algorithm to solve the two problems separately. We conduct exhaustive experiments to evaluate our attack model by attacking BiLSTM and BERT on three benchmark datasets. Experimental results demonstrate that our model consistently achieves much higher attack success rates and crafts more high-quality adversarial examples as compared to baseline methods. Also, further experiments show our model has higher transferability and can bring more robustness enhancement to victim models by adversarial training. All the code and data of this paper can be obtained on https://github.com/thunlp/SememePSO-Attack.

Citations (83)

View on Semantic Scholar

Summary

The paper’s main contribution is a novel combinatorial optimization framework that reformulates word-level adversarial attacks using sememe-based substitutions and PSO.
The methodology outperforms traditional techniques by significantly enhancing attack success, lowering modification rates, and preserving grammatical fluency.
Empirical evaluations on models like BiLSTM and BERT confirm robust performance improvements across diverse datasets and metrics.

Overview of "Word-level Textual Adversarial Attacking as Combinatorial Optimization"

The paper entitled "Word-level Textual Adversarial Attacking as Combinatorial Optimization" introduces a novel methodological framework for improving the efficacy of adversarial attacks on text-based neural network models. The key contribution is to treat word-level adversarial attacks as a combinatorial optimization problem, addressing inefficiencies in existing models.

Methodological Innovation

The authors propose an approach that divides the adversarial attack process into two key steps:

Search Space Reduction: The paper introduces a sememe-based word substitution method. Sememes, defined as the smallest semantic units in language, allow for higher-quality substitutions by focusing on semantic consistency. This method is noted to outperform others that rely on word embeddings or synonym databases like WordNet by generating more potential substitutes that preserve grammaticality and semantic intent.
Adversarial Example Search Algorithm: The authors employ Particle Swarm Optimization (PSO) as a search algorithm for generating adversarial examples. PSO, compared to other strategies such as genetic algorithms or greedy algorithms, is shown to provide more efficient convergence in finding successful attacks, even under limited information about the victim models (black-box setting).

Empirical Evaluation

The paper extensively evaluates the proposed adversarial attack framework on BiLSTM and BERT models across three datasets: IMDB, SST-2, and SNLI. The success rates, adversarial example quality (measured in terms of modification rate, grammaticality, and fluency), attack validity, and transferability of adversarial examples are presented as key metrics.

The proposed model demonstrates significantly higher attack success rates across all tested models, with figures like 100% for BiLSTM on the IMDB dataset.
Compared to baseline methods, the Sememe+PSO approach achieves lower modification rates and grammatical error increases, and maintains better fluency in adversarial examples.
Human evaluation reveals that the validity of attacks, which represents semantic consistency of adversarial examples, is competitive with or superior to existing techniques.

Implications and Future Directions

This research has several important implications. The sememe-based substitution method's ability to generate semantically consistent adversarial examples could inspire further exploration into semantic-level attacks, particularly in contexts where linguistic nuances are crucial. Likewise, the application of PSO in adversarial settings offers a robust alternative to traditional genetic algorithms, suggesting potential cross-application in other domains beyond text.

Future work could delve into leveraging these semantically rich adversarial examples not only for testing model robustness but also in defensive training strategies to harden models against attacks. Moreover, enhancements in the transferability of adversarial examples to different model architectures demonstrate exciting prospects for developing more generalized adversarial evaluation benchmarks across diverse tasks within NLP and AI.

In summary, this paper contributes a substantial advancement in adversarial NLP by aligning methodology with semantic integrity and proposing an efficient optimization framework, paving the way for both defensive and offensive innovations in neural network-based LLMs.

PDF Markdown

Related Papers

GitHub

GitHub - thunlp/SememePSO-Attack: Code and data of the ACL 2020 paper "Word-level Textual Adversarial Attacking as Combinatorial Optimization" (86 stars)