When Vision Fails: Text Attacks Against ViT and OCR (2306.07033v1)
Abstract: While text-based machine learning models that operate on visual inputs of rendered text have become robust against a wide range of existing attacks, we show that they are still vulnerable to visual adversarial examples encoded as text. We use the Unicode functionality of combining diacritical marks to manipulate encoded text so that small visual perturbations appear when the text is rendered. We show how a genetic algorithm can be used to generate visual adversarial examples in a black-box setting, and conduct a user study to establish that the model-fooling adversarial examples do not affect human comprehension. We demonstrate the effectiveness of these attacks in the real world by creating adversarial examples against production models published by Facebook, Microsoft, IBM, and Google.
- C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus, “Intriguing properties of neural networks,” in The Second International Conference on Learning Representations. ICLR, 2014.
- B. Biggio, I. Corona, D. Maiorca, B. Nelson, N. Šrndić, P. Laskov, G. Giacinto, and F. Roli, “Evasion attacks against machine learning at test time,” in Joint European conference on machine learning and knowledge discovery in databases. Springer, 2013, pp. 387–402.
- I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing adversarial examples,” arXiv preprint arXiv:1412.6572, 2014.
- N. Boucher, I. Shumailov, R. Anderson, and N. Papernot, “Bad Characters: Imperceptible NLP Attacks,” in 43rd IEEE Symposium on Security and Privacy. IEEE, 2022.
- L. Pajola and M. Conti, “Fall of giants: How popular text-based mlaas fall against a simple evasion attack,” in 2021 IEEE European Symposium on Security and Privacy (EuroS&P), 2021, pp. 198–211.
- The Unicode Consortium, “The Unicode Standard, Version 14.0,” Sep. 2021. [Online]. Available: https://www.unicode.org/versions/Unicode14.0.0
- E. Salesky, D. Etter, and M. Post, “Robust Open-Vocabulary Translation from Visual Text Representations,” in Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP). Online: Association for Computational Linguistics, Nov. 2021. [Online]. Available: https://arxiv.org/abs/2104.08211
- H. Hosseini, S. Kannan, B. Zhang, and R. Poovendran, “Deceiving google’s perspective api built for detecting toxic comments,” 2017.
- Y. Belinkov and Y. Bisk, “Synthetic and natural noise both break neural machine translation,” in International Conference on Learning Representations, 2018. [Online]. Available: https://openreview.net/forum?id=BJ8vJebC-
- J. H. Clark, D. Garrette, I. Turc, and J. Wieting, “Canine: Pre-training an efficient tokenization-free encoder for language representation,” Transactions of the Association for Computational Linguistics, vol. 10, pp. 73–91, 2022. [Online]. Available: https://aclanthology.org/2022.tacl-1.5
- J. Gao, J. Lanchantin, M. L. Soffa, and Y. Qi, “Black-box generation of adversarial text sequences to evade deep learning classifiers,” in 2018 IEEE Security and Privacy Workshops (SPW). IEEE, 2018, pp. 50–56.
- J. Li, S. Ji, T. Du, B. Li, and T. Wang, “Textbugger: Generating adversarial text against real-world applications,” arXiv preprint arXiv:1812.05271, 2018.
- Y. Belinkov and Y. Bisk, “Synthetic and natural noise both break neural machine translation,” arXiv preprint arXiv:1711.02173, 2017.
- H. Khayrallah and P. Koehn, “On the impact of various types of noise on neural machine translation,” arXiv preprint arXiv:1805.12282, 2018.
- The Unicode Consortium, “Combining Diacritical Marks,” 2021. [Online]. Available: https://www.unicode.org/charts/PDF/U0300.pdf
- The Unicode Consortium, “Combining Diacritical Marks Extended,” 2021. [Online]. Available: https://www.unicode.org/charts/PDF/U1AB0.pdf
- The Unicode Consortium, “Combining Diacritical Marks Supplement,” 2021. [Online]. Available: https://www.unicode.org/charts/PDF/U1DC0.pdf
- The Unicode Consortium, “Combining Diacritical Marks for Symbols,” 2021. [Online]. Available: https://www.unicode.org/charts/PDF/U20D0.pdf
- The Unicode Consortium, “Combining Half Marks,” 2021. [Online]. Available: https://www.unicode.org/charts/PDF/UFE20.pdf
- R. Storn and K. Price, “Differential Evolution – A Simple and Efficient Heuristic for global Optimization over Continuous Spaces,” Journal of Global Optimization, vol. 11, no. 4, pp. 341–359, Dec. 1997. [Online]. Available: https://doi.org/10.1023/A:1008202821328
- I. Shumailov, Y. Zhao, D. Bates, N. Papernot, R. Mullins, and R. Anderson, “Sponge examples: Energy-latency attacks on neural networks,” in 2021 IEEE European Symposium on Security and Privacy (EuroS&P), 2021, pp. 212–231.
- Microsoft, “Arial Unicode MS font family,” Nov. 2021. [Online]. Available: https://www.unicode.org/charts/PDF/UFE20.pdf
- M. Li, T. Lv, L. Cui, Y. Lu, D. Florencio, C. Zhang, Z. Li, and F. Wei, “TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models,” arXiv preprint arXiv:2109.10282, 2021.
- M. Ott, S. Edunov, A. Baevski, A. Fan, S. Gross, N. Ng, D. Grangier, and M. Auli, “fairseq: A fast, extensible toolkit for sequence modeling,” in Proceedings of NAACL-HLT 2019: Demonstrations, 2019.
- IBM, “Toxic comment classifier,” Dec. 2020. [Online]. Available: https://github.com/IBM/MAX-Toxic-Comment-Classifier
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” in Advances in neural information processing systems, 2017, pp. 5998–6008.
- A. Warstadt, A. Singh, and S. R. Bowman, “Neural network acceptability judgments,” arXiv preprint arXiv:1805.12471, 2018.
- M. Ott, S. Edunov, D. Grangier, and M. Auli, “Scaling neural machine translation,” in Proceedings of the Third Conference on Machine Translation: Research Papers. Brussels, Belgium: Association for Computational Linguistics, Oct. 2018, pp. 1–9. [Online]. Available: https://www.aclweb.org/anthology/W18-6301
- M. Popović, “chrF: character n-gram F-score for automatic MT evaluation,” in Proceedings of the Tenth Workshop on Statistical Machine Translation. Lisbon, Portugal: Association for Computational Linguistics, Sep. 2015, pp. 392–395. [Online]. Available: https://aclanthology.org/W15-3049
- M. Post, “A call for clarity in reporting BLEU scores,” in Proceedings of the Third Conference on Machine Translation: Research Papers. Belgium, Brussels: Association for Computational Linguistics, Oct. 2018, pp. 186–191. [Online]. Available: https://www.aclweb.org/anthology/W18-6319
- N. Thain, L. Dixon, and E. Wulczyn, “Wikipedia talk labels: Toxicity,” Feb 2017. [Online]. Available: https://figshare.com/articles/dataset/Wikipedia_Talk_Labels_Toxicity/4563973/2
- N. Mathur, J. Wei, M. Freitag, Q. Ma, and O. Bojar, “Results of the WMT20 metrics shared task,” in Proceedings of the Fifth Conference on Machine Translation. Online: Association for Computational Linguistics, Nov. 2020, pp. 688–725. [Online]. Available: https://aclanthology.org/2020.wmt-1.77
- J. Hsu, “Splend1dchan/canine-s-squad.” [Online]. Available: https://huggingface.co/Splend1dchan/canine-s-squad
- P. Rajpurkar, J. Zhang, K. Lopyrev, and P. Liang, “SQuAD: 100,000+ questions for machine comprehension of text,” in Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Austin, Texas: Association for Computational Linguistics, Nov. 2016, pp. 2383–2392. [Online]. Available: https://aclanthology.org/D16-1264
- Google, “Perspective API,” 2021. [Online]. Available: https://www.perspectiveapi.com/
- F. Tramèr, J. Behrmann, N. Carlini, N. Papernot, and J.-H. Jacobsen, “Fundamental tradeoffs between invariance and sensitivity to adversarial perturbations,” 2020.
- A. Athalye, L. Engstrom, A. Ilyas, and K. Kwok, “Synthesizing robust adversarial examples,” in International conference on machine learning. PMLR, 2018, pp. 284–293.
- M. Alzantot, Y. Sharma, A. Elgohary, B.-J. Ho, M. Srivastava, and K.-W. Chang, “Generating natural language adversarial examples,” arXiv preprint arXiv:1804.07998, 2018.
- W. Zou, S. Huang, J. Xie, X. Dai, and J. Chen, “A reinforced generation of adversarial examples for neural machine translation,” arXiv preprint arXiv:1911.03677, 2019.
- J. Yan and A. S. El Ahmad, “Breaking visual captchas with naive pattern recognition algorithms,” in Twenty-Third annual computer security applications conference (ACSAC 2007). IEEE, 2007, pp. 279–291.
- S. Azad and K. Jain, “Captcha: Attacks and weaknesses against ocr technology,” Global Journal of Computer Science and Technology, 2013.
- L. Chen, J. Sun, and W. Xu, “Fawa: Fast adversarial watermark attack on optical character recognition (ocr) systems,” arXiv preprint arXiv:2012.08096, 2020.
- K. Kurita, A. Belova, and A. Anastasopoulos, “Towards robust toxic content classification,” arXiv preprint arXiv:1912.06872, 2019.
- J. Risch and R. Krestel, “Toxic comment detection in online discussions,” in Deep Learning-Based Approaches for Sentiment Analysis. Springer, 2020, pp. 85–109.