On Adversarial Examples for Text Classification by Perturbing Latent Representations (2405.03789v1)
Abstract: Recently, with the advancement of deep learning, several applications in text classification have advanced significantly. However, this improvement comes with a cost because deep learning is vulnerable to adversarial examples. This weakness indicates that deep learning is not very robust. Fortunately, the input of a text classifier is discrete. Hence, it can prevent the classifier from state-of-the-art attacks. Nonetheless, previous works have generated black-box attacks that successfully manipulate the discrete values of the input to find adversarial examples. Therefore, instead of changing the discrete values, we transform the input into its embedding vector containing real values to perform the state-of-the-art white-box attacks. Then, we convert the perturbed embedding vector back into a text and name it an adversarial example. In summary, we create a framework that measures the robustness of a text classifier by using the gradients of the classifier.
- Generating natural language adversarial examples. arXiv preprint arXiv:1804.07998, 2018.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
- Hotflip: White-box adversarial examples for text classification. arXiv preprint arXiv:1712.06751, 2017.
- Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014.
- Long short-term memory. Neural computation, 9(8):1735–1780, 1997.
- Adversarial example generation with syntactically controlled paraphrase networks. arXiv preprint arXiv:1804.06059, 2018.
- Adversarial examples for evaluating reading comprehension systems. arXiv preprint arXiv:1707.07328, 2017.
- Is bert really robust? a strong baseline for natural language attack on text classification and entailment. In Proceedings of the AAAI conference on artificial intelligence, volume 34, pages 8018–8025, 2020.
- Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083, 2017.
- The limitations of deep learning in adversarial settings. In 2016 IEEE European symposium on security and privacy (EuroS&P), pages 372–387. IEEE, 2016.
- Enhancing adversarial examples on deep q networks with previous information. In 2021 IEEE Symposium Series on Computational Intelligence (SSCI), pages 01–07. IEEE, 2021.
- Evaluation of adversarial attacks sensitivity of classifiers with occluded input data. Neural Computing and Applications, pages 1–18, 2022.
- Evaluating accuracy and adversarial robustness of quanvolutional neural networks. In 2021 International Conference on Computational Science and Computational Intelligence (CSCI), pages 152–157. IEEE, 2021.
- One pixel attack for fooling deep neural networks. IEEE Transactions on Evolutionary Computation, 23(5):828–841, 2019.
- Learning to attack: Towards textual adversarial attacking in real-world situations. arXiv preprint arXiv:2009.09192, 2020.
- Character-level convolutional networks for text classification, 2015.
- Generating natural adversarial examples. arXiv preprint arXiv:1710.11342, 2017.