Feature Denoising for Improving Adversarial Robustness (1812.03411v2)

Published 9 Dec 2018 in cs.CV

Abstract: Adversarial attacks to image classification systems present challenges to convolutional networks and opportunities for understanding them. This study suggests that adversarial perturbations on images lead to noise in the features constructed by these networks. Motivated by this observation, we develop new network architectures that increase adversarial robustness by performing feature denoising. Specifically, our networks contain blocks that denoise the features using non-local means or other filters; the entire networks are trained end-to-end. When combined with adversarial training, our feature denoising networks substantially improve the state-of-the-art in adversarial robustness in both white-box and black-box attack settings. On ImageNet, under 10-iteration PGD white-box attacks where prior art has 27.9% accuracy, our method achieves 55.7%; even under extreme 2000-iteration PGD white-box attacks, our method secures 42.6% accuracy. Our method was ranked first in Competition on Adversarial Attacks and Defenses (CAAD) 2018 --- it achieved 50.6% classification accuracy on a secret, ImageNet-like test dataset against 48 unknown attackers, surpassing the runner-up approach by ~10%. Code is available at https://github.com/facebookresearch/ImageNet-Adversarial-Training.

Citations (858)

View on Semantic Scholar

Summary

The paper demonstrates that feature denoising blocks improve CNN adversarial robustness by reducing noise in internal feature maps.
The authors integrate non-local means based denoising blocks with a 1x1 convolution and residual connection to maintain signal integrity.
Experiments on ImageNet show the method achieves 55.7% accuracy under white-box PGD attacks and state-of-the-art performance in black-box tests.

Feature Denoising for Improving Adversarial Robustness

The paper "Feature Denoising for Improving Adversarial Robustness" by Xie et al. introduces a novel approach to enhancing the adversarial robustness of convolutional neural networks (CNNs) by incorporating feature denoising mechanisms. This paper posits that adversarial perturbations on images induce substantial noise within the internal feature maps of CNNs, thereby compromising their performance. The authors propose the integration of feature denoising blocks within network architectures to mitigate this perturbation.

Methodology

The core proposition of the paper is the enhancement of CNN architectures through specially designed denoising blocks. These blocks are formulated to reduce noise in feature maps introduced through adversarial perturbations. The authors consider various denoising operations, such as non-local means, bilateral filters, mean filters, and median filters, and investigate their efficacy in improving adversarial robustness.

A generic denoising block, based on non-local means, includes a 1x1 convolutional layer and a residual connection. This design is influenced by self-attention mechanisms and non-local networks, aimed at maintaining crucial signal information while suppressing noise.

Experimental Results

The paper provides an extensive evaluation of the proposed method's effectiveness against both white-box and black-box adversarial attacks on the ImageNet dataset.

White-Box Attacks

In white-box settings, the adversarial robustness is tested using Projected Gradient Descent (PGD) attacks with various iteration counts. The results indicate that integrating feature denoising significantly improves robustness. For instance, the denoising model achieves 55.7% accuracy under a 10-iteration PGD attack, compared to the 27.9% accuracy of previously established methods (ALP). Remarkably, even under extreme 2000-iteration PGD attacks, the model maintains 42.6% accuracy.

Ablation studies further validate that the non-local (Gaussian) denoising operation yields the highest performance improvement. Moreover, the importance of the 1x1 convolution and residual connection within the denoising block is highlighted, as their removal markedly degrades robustness.

Black-Box Attacks

In the black-box attack scenario, employing the top 5 attackers from the CAAD 2017 competition, the proposed method demonstrates superior robustness. Under a stringent "all-or-nothing" evaluation criterion, the model displays 49.5% accuracy, significantly outperforming baseline methods and the best-performing entries from the previous year.

Notably, in the CAAD 2018 competition, the method achieved first place, with an accuracy of 50.6% against 48 unknown attackers on a secret, ImageNet-like test dataset. This success underscores the practical utility and robustness of the feature denoising approach in highly competitive and unpredictable environments.

Implications and Future Work

The integration of feature denoising blocks into CNNs represents a promising advancement in improving adversarial robustness. This approach not only demonstrates efficacy in current settings but also suggests a new architectural design principle for future models.

Theoretical implications point towards a better understanding of how adversarial perturbations affect internal feature representations and how these effects can be mitigated. Practically, the incorporation of feature denoising can be applied to various domains where adversarial robustness is critical, such as autonomous driving, medical imaging, and security systems.

Future work could explore further refinements of denoising mechanisms, adaptive denoising strategies based on feature distributions, and extending this approach to other types of adversarial attacks and domains. Additionally, the trade-offs between clean performance and adversarial robustness remain an interesting research avenue, particularly for applications requiring both high accuracy and robustness.

In summary, the paper presents a methodologically sound and empirically validated approach to enhancing adversarial robustness in CNNs through feature denoising. It sets a new direction for adversarial defense research and opens up numerous possibilities for further exploration and application.

PDF Markdown

Related Papers

GitHub

GitHub - facebookresearch/ImageNet-Adversarial-Training: ImageNet classifier with state-of-the-art adversarial robustness (680 stars)