Enhancing Adversarial Example Transferability with an Intermediate Level Attack

Published 23 Jul 2019 in cs.LG, cs.CR, cs.CV, and stat.ML | (1907.10823v3)

Abstract: Neural networks are vulnerable to adversarial examples, malicious inputs crafted to fool trained models. Adversarial examples often exhibit black-box transfer, meaning that adversarial examples for one model can fool another model. However, adversarial examples are typically overfit to exploit the particular architecture and feature representation of a source model, resulting in sub-optimal black-box transfer attacks to other target models. We introduce the Intermediate Level Attack (ILA), which attempts to fine-tune an existing adversarial example for greater black-box transferability by increasing its perturbation on a pre-specified layer of the source model, improving upon state-of-the-art methods. We show that we can select a layer of the source model to perturb without any knowledge of the target models while achieving high transferability. Additionally, we provide some explanatory insights regarding our method and the effect of optimizing for adversarial examples using intermediate feature maps. Our code is available at https://github.com/CUVL/Intermediate-Level-Attack.

Abstract PDF Upgrade to Chat

Authors (6)

Citations (220)

View on Semantic Scholar

Summary

The paper introduces ILA, a novel framework that enhances adversarial transferability by refining perturbations at intermediate model layers.
ILA employs two variants, ILAP and ILAF, to optimize both perturbation direction and magnitude, demonstrating improved attacks on CIFAR-10 and ImageNet.
Strategically selecting intermediate layers based on disturbance peaks significantly elevates the effectiveness of black-box adversarial attacks.

Enhancing Adversarial Example Transferability with an Intermediate Level Attack

The paper "Enhancing Adversarial Example Transferability with an Intermediate Level Attack" contributes to the burgeoning field of adversarial machine learning by introducing a novel adversarial attack framework designed to improve the transferability of adversarial examples across different neural network models. This study recognizes the challenge posed by overfitting in crafting adversarial examples, which can limit their effectiveness when transferred to target models distinct from the source model.

Objective and Methodology

The principal objective of the paper is to tackle the limitation of black-box adversarial transferability. To address this, the authors propose the Intermediate Level Attack (ILA) method, which refines existing adversarial examples to enhance their ability to deceive multiple models. ILA focuses on increasing the perturbation specifically at an intermediate layer of the source model, aiming to retain the original adversarial direction while boosting the perturbations' intensity in a manner conducive to transferability.

ILA operates under two variants:

ILAP (Intermediate Level Attack Projection): Emphasizes maintaining the original adversarial direction by maximizing projection. This is achieved by optimizing a loss function that considers the dot product between the initial and refined perturbations.
ILAF (Intermediate Level Attack Flexible): Introduces an additional flexibility parameter, balancing adherence to the initial direction with maximizing perturbation magnitude, allowing for further optimization and potentially better transfer rates.

Experimental Results

The efficacy of ILA was empirically validated on both CIFAR-10 and ImageNet datasets using established models such as ResNet18, DenseNet121, SENet18, and GoogLeNet. By starting with adversarial examples generated by standard attacks (e.g., I-FGSM, MI-FGSM, Carlini-Wagner) and refining these using ILA, the authors demonstrated improved transferability to other model architectures. Notably, ILA showed gains over state-of-the-art transfer attacks like TAP and Xie's DI2-FGSM on ImageNet, indicating its broad applicability.

The paper also highlights the significance of selecting the correct intermediate layer to target for perturbations, as this choice impacts the transferability of the adversarial examples. Experiments revealed that choosing layers that displayed late peaks in disturbance values often led to optimal or near-optimal transferability. This insight underpins a proposed strategy for pre-selecting effective layers using the source model exclusively, which can simplify and automate the tuning process for enhancing transfer attacks.

Theoretical Implications and Future Directions

The approach outlined in this paper aligns with ongoing research to comprehend and manipulate the feature representation space within neural networks. ILA’s capability to increase adversarial transferability has both practical and theoretical ramifications. Practically, it raises the security concerns for systems reliant on machine learning models, as black-box attacks can now be more potent. Theoretically, it provides insights into the alignment of decision boundaries across different models, suggesting underlying commonalities within deep feature representations.

Future research could focus on further refinement of ILA, potentially exploring its application to targeted adversarial attacks and universal perturbations. Additionally, understanding how different architectures contribute to variations in transferability can foster more robust adversarial defenses, aiming to generalize beyond specific attack patterns.

In conclusion, the Intermediate Level Attack framework stands as a compelling advancement in crafting adversarial examples that are not only effective for individual models but also demonstrate robust transferable capabilities across diverse neural network architectures. This research enriches our understanding of adversarial dynamics and sets the stage for further exploration into intermediate feature space manipulation as a tool for adversarial robustness.

Markdown Report Issue