Sparse and Imperceivable Adversarial Attacks (1909.05040v1)

Published 11 Sep 2019 in cs.LG, cs.CR, cs.CV, and stat.ML

Abstract: Neural networks have been proven to be vulnerable to a variety of adversarial attacks. From a safety perspective, highly sparse adversarial attacks are particularly dangerous. On the other hand the pixelwise perturbations of sparse attacks are typically large and thus can be potentially detected. We propose a new black-box technique to craft adversarial examples aiming at minimizing $l_0$-distance to the original image. Extensive experiments show that our attack is better or competitive to the state of the art. Moreover, we can integrate additional bounds on the componentwise perturbation. Allowing pixels to change only in region of high variation and avoiding changes along axis-aligned edges makes our adversarial examples almost non-perceivable. Moreover, we adapt the Projected Gradient Descent attack to the $l_0$-norm integrating componentwise constraints. This allows us to do adversarial training to enhance the robustness of classifiers against sparse and imperceivable adversarial manipulations.

Citations (188)

View on Semantic Scholar

Summary

The paper introduces a novel black-box method that combines local search with an l0-norm adaptation of PGD to generate sparse, effective adversarial examples.
It integrates locally adaptive pixel constraints to ensure the perturbations remain imperceivable by targeting high-variation areas and avoiding prominent edges.
Experimental results on datasets like MNIST and CIFAR-10 show that the method achieves success with modifications to less than 1% of pixels, highlighting its potential threat in security-critical applications.

Sparse and Imperceivable Adversarial Attacks

The research paper "Sparse and Imperceivable Adversarial Attacks" by Francesco Croce and Matthias Hein addresses a significant yet challenging aspect of machine learning security, particularly concerning the vulnerability of neural networks to adversarial examples. While the susceptibility of neural networks to even minor adversarial perturbations is well-documented, this paper focuses on crafting attacks that are both sparse, altering only a minimal number of pixels, and imperceivable, meaning these changes remain undetected to the human eye.

Summary of Key Contributions

The authors introduce a novel black-box technique for generating adversarial examples, emphasizing the minimization of the $l_0$ -norm, which measures the sparsity of changes. This approach is especially pertinent in safety-critical applications where robust decision-making is essential. The significant contributions of this paper include:

Black-Box Attack with Local Search: The proposed method outperforms existing $l_0$ -attacks by combining local search with black-box techniques, achieving competitive success rates while ensuring sparsity constraints are respected.
$l_0$ -Norm Adaptation of PGD Attack: The technique adapts the Projected Gradient Descent (PGD) method to account for $l_0$ -norm, incorporating additional componentwise constraints to ensure imperceivability.
Integration of Componentwise Constraints: By allowing pixel changes only in regions of high variation while avoiding axis-aligned edges, the paper ensures adversarial examples remain largely inconspicuous. The authors propose locally adaptive constraints that enhance the imperceivability of attacks.

Experimental Results

Extensive experiments demonstrate that the new attacks are on par with state-of-the-art methods in terms of success rates while requiring fewer modifications, exemplifying their sparsity. For example, on datasets such as MNIST and CIFAR-10, the proposed CornerSearch algorithm requires fewer pixel modifications compared to existing methods, often less than 1% of the pixels, highlighting the efficacy of their sparse approach.

Implications and Future Directions

The implications of this research are notable for the development of more resilient AI systems, particularly in areas where imperceptibility of attacks could exploit system vulnerabilities. The success of sparse and imperceivable attacks contests the conventional assumption that such manipulations are easily detected. Adversarial examples crafted with the presented methods show a 50-70% success rate on standard models, which, although smaller compared to other attacks, is significant enough to warrant attention to such vulnerabilities.

Furthermore, the paper explores adversarial training as a defense mechanism against these sparse attacks, providing evidence that adversarial training based on $l_2$ or $l_\infty$ norms can partially mitigate the risks of $l_0$ -attacks. However, to specifically defend against $l_0$ and imperceivable attacks, the paper suggests adversarial training techniques tailored to these norms.

Speculation on Future Developments

Future work could focus on further refining the balance between sparsity and success rates. Additionally, exploring combinations of adversarial training with other defense mechanisms might enhance robustness across different types of adversarial settings. With neural networks increasingly deployed in security-critical environments, exploring mechanisms against both sparse and dense attacks remains a pivotal research area.

Overall, this paper underscores the urgency to address vulnerabilities in AI systems by crafting attacks that exploit both sparsity and imperceptibility, opening avenues for developing improved defensive strategies to ensure the reliability and safety of neural network applications.

PDF Markdown

Related Papers

GitHub

GitHub - fra31/sparse-imperceivable-attacks: Sparse and Imperceivable Adversarial Attacks (accepted to ICCV 2019). (40 stars)