On the Robustness of Vision Transformers to Adversarial Examples (2104.02610v2)

Published 31 Mar 2021 in cs.CV and cs.LG

Abstract: Recent advances in attention-based networks have shown that Vision Transformers can achieve state-of-the-art or near state-of-the-art results on many image classification tasks. This puts transformers in the unique position of being a promising alternative to traditional convolutional neural networks (CNNs). While CNNs have been carefully studied with respect to adversarial attacks, the same cannot be said of Vision Transformers. In this paper, we study the robustness of Vision Transformers to adversarial examples. Our analyses of transformer security is divided into three parts. First, we test the transformer under standard white-box and black-box attacks. Second, we study the transferability of adversarial examples between CNNs and transformers. We show that adversarial examples do not readily transfer between CNNs and transformers. Based on this finding, we analyze the security of a simple ensemble defense of CNNs and transformers. By creating a new attack, the self-attention blended gradient attack, we show that such an ensemble is not secure under a white-box adversary. However, under a black-box adversary, we show that an ensemble can achieve unprecedented robustness without sacrificing clean accuracy. Our analysis for this work is done using six types of white-box attacks and two types of black-box attacks. Our study encompasses multiple Vision Transformers, Big Transfer Models and CNN architectures trained on CIFAR-10, CIFAR-100 and ImageNet.

Citations (200)

View on Semantic Scholar

Summary

The paper shows that Vision Transformers suffer from severe vulnerabilities under white-box adversarial attacks, with robust accuracies dropping to 0% for some methods.
It finds that adversarial examples exhibit low transferability between Vision Transformers, CNNs, and Big Transfer Models, indicating potential benefits of ensemble defenses.
The study introduces the Self-Attention Gradient Attack (SAGA), which effectively compromises ensemble defenses and encourages the development of new robust security strategies.

Analyzing the Robustness of Vision Transformers to Adversarial Attacks

The paper "On the Robustness of Vision Transformers to Adversarial Examples" provides a detailed investigation into the security of Vision Transformers (ViT) against adversarial attacks. This paper addresses a gap in research concerning the relative security of Vision Transformers as compared to the more extensively studied Convolutional Neural Networks (CNNs). In undertaking this, the authors leverage a broad suite of adversarial attacks and defenses to assess robustness and transferability, yielding insights with both practical and theoretical implications for deep learning models.

Primarily, this work confirms that Vision Transformers, despite their recent emergence as a promising alternative in image classification tasks, demonstrate vulnerabilities similar to CNNs when exposed to a comprehensive set of white-box adversarial attacks. These include FGSM, PGD, MIM, BPDA, C&W, and APGD, which consistently result in low robust accuracy across various datasets such as CIFAR-10, CIFAR-100, and ImageNet. Notably, the lack of robustness here contradicts earlier hypotheses suggesting that transformers might inherently exhibit enhanced resistance due to self-attention mechanisms—a notion now challenged by empirical results showing robust accuracies as low as 0% for some attack types.

Where this paper stands out is in its exploration of adversarial transferability between different neural network architectures. It manifests that adversarial examples often fail to transfer effectively between Vision Transformers and CNNs or Big Transfer Models. This phenomenon is evidenced by low transfer rates between model genuses, implying that adversarial samples generated to exploit one type of architecture may not generalize to another, suggesting potential for increased robustness through ensemble methods.

Yet, the paper also dismisses the robustness of simple ensemble defenses under white-box conditions by introducing the Self-Attention Gradient Attack (SAGA). This novel attack undermines ensembles by combining self-attention focus areas with gradients from multiple models, achieving high attack success rates. Despite the sophistication of SAGA, the intricate blend of self-attention weights and differing model architectures highlights challenges in securing transformer-based models.

The potential for ensemble-based defenses against black-box adversaries, however, is promising. By leveraging the naturally low transferability rates observed, the authors illustrate that combining Vision Transformers with Big Transfer Models in an ensemble can achieve substantial robustness without compromising clean accuracy. This result offers significant motivation for further research into robust learning paradigms that encapsulate both diverse architectures and adaptive security practices.

The implications extend to guiding future defense strategies in AI, which may focus on creating more heterogeneous model environments or innovating adaptive adversary-aware training processes. While the paper provides a snapshot of current adversarial challenges facing Vision Transformers, it lays groundwork for an evolving discourse on deploying more secure AI systems. An invitation to explore deeper integration of attention mechanisms and convolutional encodings beckons, ensuring that Vision Transformers evolve not just in accuracy but in resilience against adversarial intent.

PDF Markdown

Related Papers

GitHub

GitHub - MetaMain/ViTRobust: Code corresponding to the paper: "On the Robustness of Vision Transformers": https://arxiv.org/abs/2104.02610 (24 stars)