On Generating Transferable Targeted Perturbations (2103.14641v2)

Published 26 Mar 2021 in cs.CV

Abstract: While the untargeted black-box transferability of adversarial perturbations has been extensively studied before, changing an unseen model's decisions to a specific targeted' class remains a challenging feat. In this paper, we propose a new generative approach for highly transferable targeted perturbations (\ours). We note that the existing methods are less suitable for this task due to their reliance on class-boundary information that changes from one model to another, thus reducing transferability. In contrast, our approach matches the perturbed imagedistribution' with that of the target class, leading to high targeted transferability rates. To this end, we propose a new objective function that not only aligns the global distributions of source and target images, but also matches the local neighbourhood structure between the two domains. Based on the proposed objective, we train a generator function that can adaptively synthesize perturbations specific to a given input. Our generative approach is independent of the source or target domain labels, while consistently performs well against state-of-the-art methods on a wide range of attack settings. As an example, we achieve $32.63\%$ target transferability from (an adversarially weak) VGG19$_{BN}$ to (a strong) WideResNet on ImageNet val. set, which is 4$\times$ higher than the previous best generative attack and 16$\times$ better than instance-specific iterative attack. Code is available at: {\small\url{https://github.com/Muzammal-Naseer/TTP}}.

Citations (66)

View on Semantic Scholar

Summary

The paper introduces a novel generative approach that crafts adversarial perturbations aimed at specific target classes with enhanced model transferability.
It employs Kullback-Leibler divergence to align perturbed inputs with target distribution characteristics, significantly outperforming traditional instance-specific methods.
Ensemble learning and data augmentation are integrated to boost robustness and adaptability, ensuring effective misclassification across models like VGG, ResNet, and DenseNet.

An Overview of "On Generating Transferable Targeted Perturbations"

This paper presents a sophisticated approach to crafting targeted adversarial perturbations with high transferability across different deep neural network models. The primary focus here is on manipulating input images such that they are classified into a specific target category by an unknown model, rather than achieving mere misclassification.

Key Contributions and Methodology

Generative Approach: The authors introduce a generative framework that leverages unsupervised or supervised learning features from a pretrained discriminator. Unlike previous methods, which often hinge on specific class-boundary information, this approach focuses on generating perturbations that match global and local target class distribution characteristics.
Loss Function Development: The method employs advanced probabilistic measurements, namely Kullback-Leibler divergence, to align the distributions of perturbed source data with the intended target data in the latent space. This promotes particularly strong model-to-model transferability of adversarial examples.
Augmented and Ensemble Learning: Recognizing the innate differences in models, the framework incorporates diverse augmentations as a regularization strategy during training to ensure robustness against transformations that might be implemented in different models. Additionally, an ensemble of weaker models is used to enhance the perturbations’ generalized alignment with the target class distribution.
Experimental Evaluation: Extensive experiments demonstrate that this method achieves remarkable success rates in targeted misclassification across various CNN architectures, including VGG, ResNet, and DenseNet, under black-box settings. Notably, the technique outperforms other generative techniques as well as instance-specific attacks by allowing rapid convergence and minimizing computational overhead.

Notable Results

The proposed approach achieves a notable 32.63% target transferability from VGG19 $_{BN}$ to the WideResNet model, a leap over traditional methods. Furthermore, the ensemble learning strategy exhibited a dramatic increase in transferability to adversaries such as AugMix or stylized training defenses on ImageNet, indicating robustness against both naturally and adversarially trained defenses.

Implications and Future Direction

This paper's findings suggest practical improvement in crafting adversarial examples for scenarios like model vulnerability assessment and robustness testing. The utilization of unsupervised features notably broadens the applicability, cutting across domains and data modalities without relying on labeled datasets. While this advances the field of adversarial machine learning, it also prompts further research into incorporating this robustness into next-generation model designs and the development of defense mechanisms against adversarial attacks.

In summary, this paper presents a robust framework for generating targeted adversarial perturbations, challenging existing paradigms in the domain of adversarial attacks with its innovative approach and solid empirical validations. This work is pivotal in shaping robust AI systems and deepens our understanding of adversarial dynamics, potentially heralding new defensing strategies against adversarial vulnerabilities in deep learning architectures.

PDF Markdown

Related Papers

GitHub

GitHub - Muzammal-Naseer/TTP: Official repository for "On Generating Transferable Targeted Perturbations" (ICCV 2021) (59 stars)