Defense against Universal Adversarial Perturbations

Published 16 Nov 2017 in cs.CV | (1711.05929v3)

Abstract: Recent advances in Deep Learning show the existence of image-agnostic quasi-imperceptible perturbations that when applied to any' image can fool a state-of-the-art network classifier to change its prediction about the image label. TheseUniversal Adversarial Perturbations' pose a serious threat to the success of Deep Learning in practice. We present the first dedicated framework to effectively defend the networks against such perturbations. Our approach learns a Perturbation Rectifying Network (PRN) as `pre-input' layers to a targeted model, such that the targeted model needs no modification. The PRN is learned from real and synthetic image-agnostic perturbations, where an efficient method to compute the latter is also proposed. A perturbation detector is separately trained on the Discrete Cosine Transform of the input-output difference of the PRN. A query image is first passed through the PRN and verified by the detector. If a perturbation is detected, the output of the PRN is used for label prediction instead of the actual image. A rigorous evaluation shows that our framework can defend the network classifiers against unseen adversarial perturbations in the real-world scenarios with up to 97.5% success rate. The PRN also generalizes well in the sense that training for one targeted network defends another network with a comparable success rate.

Abstract PDF Upgrade to Chat

Authors (3)

Citations (202)

View on Semantic Scholar

Summary

The paper introduces a Perturbation Rectifying Network (PRN) that rectifies adversarial inputs, restoring classifier accuracy.
It leverages both real and synthetic perturbations along with a DCT-based detector to effectively identify attacks.
The framework demonstrates cross-model robustness on networks like CaffeNet, VGG-F, and GoogLeNet, enhancing defense against UAPs.

Defense against Universal Adversarial Perturbations: An Expert Overview

The paper "Defense against Universal Adversarial Perturbations" addresses a crucial challenge in deep learning and computer vision: the vulnerability of deep neural networks to universal adversarial perturbations (UAPs). These perturbations are quasi-imperceptible, image-agnostic transformations that can significantly alter the predictions of state-of-the-art neural network classifiers, thus posing a significant threat to the practical deployment of these networks in real-world scenarios.

Contribution and Methodology

The authors present a novel framework to defend against UAPs by introducing a Perturbation Rectifying Network (PRN). The PRN serves as a pre-input layer to the targeted network model, enabling effective defense without requiring any modifications to the existing network architecture. This approach leverages real and synthetically generated perturbations for training.

The paper's methodology involves two primary components:

Perturbation Rectifying Network (PRN): This network acts as a transformative layer that rectifies perturbed input images, ensuring that the original classifier can accurately predict labels from potentially adversarial inputs. The PRN is trained end-to-end with the target network using both clean and perturbed images.
Perturbation Detector: A detector is trained to identify adversarial perturbations by analyzing the Discrete Cosine Transform (DCT) of the difference between the input and output of the PRN. This binary classifier helps decide whether the rectified image should replace the input for classification.

Additionally, the paper describes a method to efficiently generate synthetic perturbations, drawing upon theoretical insights regarding the decision boundary's vulnerabilities in neural networks. This aspect enhances the dataset of perturbations used for training the PRN, potentially improving defense robustness.

Experimental Results

The framework was evaluated using CaffeNet, VGG-F network, and GoogLeNet, demonstrating a high degree of effectiveness against UAPs. The experiments utilized both real and synthetic perturbations generated through the authors' method. Key findings from the experiments include:

A high PRN gain was achieved, indicating significant improvement in classification accuracy on rectified images compared to perturbed inputs.
The framework demonstrated strong detection and defense rates, maintaining high accuracy levels relative to the networks' baseline performance on clean data.
Cross-model generalization was observed, with the framework showing effectiveness even when tested on networks different from those it was trained on.

Implications and Future Work

This paper's findings have critical implications for the deployment of deep learning models in environments where adversarial attacks are a concern. The ability to integrate a defense mechanism externally, without modifying the core architecture of the target network, is highly beneficial for maintaining the operational efficiency of already deployed models.

Further exploration could involve extending the approach to other types of adversarial attacks beyond image classification tasks, such as detection and segmentation. Additionally, exploring the transferability of different perturbation types and further optimization of synthetic perturbation generation can provide deeper insights into enhancing model robustness.

In conclusion, the paper presents a well-structured and effective strategy for defending neural networks against UAPs, which is a significant advancement in ensuring the reliability and security of AI systems in practical applications.

Markdown Report Issue