One pixel attack for fooling deep neural networks (1710.08864v7)

Published 24 Oct 2017 in cs.LG, cs.CV, and stat.ML

Abstract: Recent research has revealed that the output of Deep Neural Networks (DNN) can be easily altered by adding relatively small perturbations to the input vector. In this paper, we analyze an attack in an extremely limited scenario where only one pixel can be modified. For that we propose a novel method for generating one-pixel adversarial perturbations based on differential evolution (DE). It requires less adversarial information (a black-box attack) and can fool more types of networks due to the inherent features of DE. The results show that 67.97% of the natural images in Kaggle CIFAR-10 test dataset and 16.04% of the ImageNet (ILSVRC 2012) test images can be perturbed to at least one target class by modifying just one pixel with 74.03% and 22.91% confidence on average. We also show the same vulnerability on the original CIFAR-10 dataset. Thus, the proposed attack explores a different take on adversarial machine learning in an extreme limited scenario, showing that current DNNs are also vulnerable to such low dimension attacks. Besides, we also illustrate an important application of DE (or broadly speaking, evolutionary computation) in the domain of adversarial machine learning: creating tools that can effectively generate low-cost adversarial attacks against neural networks for evaluating robustness.

Citations (2,199)

View on Semantic Scholar

Summary

The paper demonstrates that manipulating a single pixel via differential evolution fools DNNs, achieving success rates of 63%-72% on CIFAR-10 and 16% on ImageNet.
It reveals the attack’s versatility across various network architectures, remaining effective even when extended to multi-pixel perturbations.
The study underscores the efficiency and low perceptual distortion of the DE-based attack, prompting the need for advanced adversarial defenses in DNNs.

Analysis of "One Pixel Attack for Fooling Deep Neural Networks"

The paper "One Pixel Attack for Fooling Deep Neural Networks" by Jiawei Su, Danilo Vasconcellos Vargas, and Kouichi Sakurai explores the vulnerabilities of Deep Neural Networks (DNNs) by introducing a novel attack paradigm that manipulates just a single pixel to generate adversarial examples. This paper contributes significantly to adversarial machine learning by showcasing the extent of susceptibility in DNNs under extremely constrained perturbation scenarios.

The research adopts the differential evolution (DE) algorithm to craft what the authors term as "one-pixel attacks." Unlike many prior methods that rely on gradient information or extensive perturbation, this approach is a black-box attack, requiring no detailed knowledge of the network's inner workings (e.g., gradients or architecture). The effectiveness of this attack is evaluated on three well-established neural network architectures trained on the CIFAR-10 dataset—All Convolutional Network (AllConv), Network in Network (NiN), and VGG16—as well as on the BVLC AlexNet model trained on the ImageNet dataset.

Major Findings and Contributions

Attack Success Rates and Confidence:
- On the CIFAR-10 dataset, the one-pixel perturbation yielded success rates of approximately 68.71%, 71.66%, and 63.53% on AllConv, NiN, and VGG16 networks, respectively. These attacks showed confidence values averaging around 77-79% for the targeted class.
- For the ImageNet dataset, the one-pixel attack could alter the classification of 16.04% of the images, with an average confidence of 22.91% for misclassification.
Versatility and Flexibility:
- The experiments demonstrated the high flexibility of the one-pixel attacks across different DNN structures. Notably, this method retains effectiveness even when the attack is extended to scenarios allowing three or five pixel modifications, increasing the success rate and the number of target classes an image can be perturbed into.
Geometric Insights:
- The paper provides geometric interpretations of adversarial perturbations, highlighting that even modifications restricted to one-dimensional slices in the input space can uncover vulnerabilities in DNNs. The data suggests that natural images' decision boundaries in the high-dimensional space of DNNs are closer than previously understood.
Comparison with Random and Other Perturbation Methods:
- When compared to random one-pixel modifications, the differential evolution method showed higher efficacy, demonstrating that the DE algorithm's structured search significantly outperforms random trials in locating adversarial perturbations.
Time Complexity and Distortion:
- The DE-based one-pixel attack is computationally efficient with an average number of evaluations being manageable even for larger datasets like ImageNet. The average distortion introduced by a one-pixel modification is quantified, suggesting minimal perceptual alterations.

Theoretical and Practical Implications

The findings have notable implications for both theoretical research and practical applications. Theoretically, they underscore the importance of examining DNN robustness not only against large perturbations but also under strict constraints, thereby contributing to a more nuanced understanding of DNN vulnerabilities. Practically, the insights from one-pixel attacks caution against over-reliance on current DNNs in security-critical applications without robust adversarial training or detection mechanisms.

Future developments in AI and security domains should consider more advanced adversarial defenses. Techniques to improve DNN robustness could include more stringent testing across a wide range of perturbations, including those mimicking the one-pixel attack. Additionally, evolutionary algorithms beyond DE, such as Co-variance Matrix Adaptation Evolution Strategy (CMA-ES), could be explored for more efficient adversarial sample generation.

Conclusion

This paper opens a critical discussion on the robustness of DNNs by highlighting their vulnerability to extremely low-dimensional perturbations. The proposed one-pixel attack not only aligns with existing adversarial research but also pushes the boundary by requiring minimal perturbation, thereby providing a valuable tool for testing and improving the security of DNN models. Consequently, further exploration and enhancement of adversarial attack and defense mechanisms remain essential in advancing the reliability of AI systems.