A Self-supervised Approach for Adversarial Robustness (2006.04924v1)

Published 8 Jun 2020 in cs.CV

Abstract: Adversarial examples can cause catastrophic mistakes in Deep Neural Network (DNNs) based vision systems e.g., for classification, segmentation and object detection. The vulnerability of DNNs against such attacks can prove a major roadblock towards their real-world deployment. Transferability of adversarial examples demand generalizable defenses that can provide cross-task protection. Adversarial training that enhances robustness by modifying target model's parameters lacks such generalizability. On the other hand, different input processing based defenses fall short in the face of continuously evolving attacks. In this paper, we take the first step to combine the benefits of both approaches and propose a self-supervised adversarial training mechanism in the input space. By design, our defense is a generalizable approach and provides significant robustness against the \textbf{unseen} adversarial attacks (\eg by reducing the success rate of translation-invariant \textbf{ensemble} attack from 82.6\% to 31.9\% in comparison to previous state-of-the-art). It can be deployed as a plug-and-play solution to protect a variety of vision systems, as we demonstrate for the case of classification, segmentation and detection. Code is available at: {\small\url{https://github.com/Muzammal-Naseer/NRP}}.

Authors (5)

Muzammal Naseer (67 papers)
Salman Khan (244 papers)
Munawar Hayat (73 papers)
Fahad Shahbaz Khan (225 papers)
Fatih Porikli (141 papers)

Citations (225)

View on Semantic Scholar

Summary

The paper presents the Neural Representation Purifier (NRP) as a self-supervised training mechanism that enhances adversarial robustness across multiple vision tasks.
It leverages feature space perturbations without relying on task-specific labels, thereby maintaining accuracy on clean data.
Experimental results demonstrate robust defense against state-of-the-art black-box attacks, reducing attack success rates significantly.

Insights into Adversarial Robustness Through Self-supervision

The paper, "A Self-supervised Approach for Adversarial Robustness," presents a unique method for enhancing adversarial robustness in deep neural networks (DNNs) using a self-supervised paradigm. The research addresses a crucial vulnerability in DNN-based vision systems, where minor perturbations to input data—adversarial examples—can lead to severe misclassifications. The proposed approach aims to leverage the advantages of adversarial training (AT) and input processing, which traditionally face limitations in scalability, computational cost, accuracy, and task generalization.

Adversarial examples have exhibited a strong transferability, demanding robust solutions that can guard against a broad spectrum of attacks across various tasks such as classification, segmentation, and object detection. The prevailing techniques of adversarial training often lack the generalizability required for effective cross-task defense and exhibit a decrease in accuracy on clean data distributions.

The authors introduce a novel self-supervised adversarial training mechanism, the Neural Representation Purifier (NRP), which operates effectively in the input space and can be applied post hoc to protect diverse vision systems. The NRP is trained using self-supervised perturbations that maximize distortion in the feature space, a crucial advancement that does not rely on task-specific labels or loss functions, thus enhancing generalizability.

Key Contributions

Task Generalizability: The introduction of NRP allows for a task-independent adversarial training mechanism. Once trained, NRP is a versatile component that can safeguard various vision tasks without additional retraining efforts, demonstrating its efficacy on classification, detection, and segmentation tasks.
Self-supervised Signal: By training the NRP with feature distortion as the self-supervised signal, the method avoids the pitfalls of label leakage, enhancing generalization against unseen adversarial attacks.
Transferability and Robustness: The paper highlights the NRP's ability to recover inputs effectively from state-of-the-art black-box adversarial attacks, such as the translation-invariant ensemble attack, showing a significant reduction in attack success rate compared to prior methods.
Maintaining Clean Image Accuracy: While traditional adversarial training approaches result in a notable drop in accuracy for clean images, the NRP exhibits minimal impact on the accuracy of non-adversarial examples, maintaining the performance baseline of the original model.

Implications and Future Directions

The approach outlined in this paper not only offers a scalable and efficient solution to adversarial robustness but also sets a precedent for future work in adversarial defense that does not depend on exhaustive labeled datasets. The implications of this research are vast, suggesting a move towards increasingly generalized frameworks in AI, where systems learn to protect themselves independently of the underlying task or data specifics.

Future research can expand on this foundation by exploring the integration of self-supervised adversarial training into larger computing frameworks or assessing its utility in other AI applications beyond vision systems. Additionally, further investigation into the theoretical underpinnings of feature space distortions might offer deeper insights into optimizing such defenses.

In conclusion, this work represents a significant step in advancing the standing of adversarially robust networks and broadens the pathways to deploy safe and reliable DNNs in real-world scenarios.

PDF Markdown

Related Papers

GitHub

GitHub - Muzammal-Naseer/NRP: Official repository for "A Self-supervised Approach for Adversarial Robustness" (CVPR 2020--Oral) (96 stars)