Denoised Smoothing: A Provable Defense for Pretrained Classifiers

Published 4 Mar 2020 in cs.LG, cs.CR, cs.CV, and stat.ML | (2003.01908v2)

Abstract: We present a method for provably defending any pretrained image classifier against $\ell_p$ adversarial attacks. This method, for instance, allows public vision API providers and users to seamlessly convert pretrained non-robust classification services into provably robust ones. By prepending a custom-trained denoiser to any off-the-shelf image classifier and using randomized smoothing, we effectively create a new classifier that is guaranteed to be $\ell_p$-robust to adversarial examples, without modifying the pretrained classifier. Our approach applies to both the white-box and the black-box settings of the pretrained classifier. We refer to this defense as denoised smoothing, and we demonstrate its effectiveness through extensive experimentation on ImageNet and CIFAR-10. Finally, we use our approach to provably defend the Azure, Google, AWS, and ClarifAI image classification APIs. Our code replicating all the experiments in the paper can be found at: https://github.com/microsoft/denoised-smoothing.

Abstract PDF Upgrade to Chat

Citations (23)

View on Semantic Scholar

Summary

The paper introduces denoised smoothing, a method that prepends a learned denoiser to a pretrained classifier to provide provable robustness against adversarial attacks.
It combines a denoiser with randomized smoothing to significantly boost certified accuracy, with experiments showing improvements from 4% to over 30% on ImageNet.
The approach enables robust defenses without retraining, making it practical for securing widely used vision APIs and deep learning models.

Denoised Smoothing: A Provable Defense Against Adversarial Attacks

The paper "Denoised Smoothing: A Provable Defense for Pretrained Classifiers" presents a significant contribution towards developing robust deep learning models that can withstand adversarial attacks. Leveraging the inherent structure of pretrained image classifiers, the authors introduce a method known as "denoised smoothing," which provides certified robustness to existing classifiers against $\ell_p$ adversarial perturbations.

The primary motivation of the work is to address the vulnerability of image classifiers to adversarial attacks, where small, often imperceptible perturbations in input images can lead to drastic changes in classification results. Previous efforts were either heuristic, lacking provable guarantees, or were tied to expensive processes of retraining models from scratch specifically for robustness.

Denoised smoothing represents an advancement whereby robustness does not necessitate retraining the entire classifier. Instead, a learned denoiser is prepended to any pretrained image classifier. The denoiser’s purpose is to mitigate noise before the prediction step, which combines with randomized smoothing—a certified defense method ensuring adversarial robustness by transforming the classifier into a smoothed version that outputs the class most likely to be returned under Gaussian noise perturbations.

Experimental Results and Their Significance

The authors validate their approach through extensive experiments on datasets like ImageNet and CIFAR-10. Impressively, the paper reports substantial improvements in certified accuracy without altering the pretrained models themselves. For instance, an ImageNet-pretrained ResNet-50 classifier’s certified accuracy was enhanced from 4\% to 31\% and 33\% for black-box and white-box settings respectively, illustrating the effectiveness in practical adversarial setups.

Included tables demonstrate the method’s performance across varying $\ell_2$ radii, underscoring the robustness over varied perturbation intensities. These results suggest that denoised smoothing bridges the gap between theoretical guarantees provided by randomized smoothing and practical feasibility in real-world applications.

Implications and Future Directions

The practical implications of this work are substantial, especially considering the prevalence of pretrained models in both academic and industrial applications. It allows practitioners to transform existing non-robust public APIs into robust versions without accessing or modifying the internal workings of these APIs. For example, the paper successfully tests denoised smoothing on well-known vision APIs such as Azure, Google, AWS, and ClarifAI.

Theoretically, the approach enriches our understanding of adversarial defenses by promoting the use of input transformations that maintain provable robustness. This technique contrasts with many input transformation methods that fail against adaptive attacks.

Future research could explore enhancing the denoising process to improve the certified accuracy further or generalize denoised smoothing across different types of input data and neural network architectures, beyond image classifiers. Moreover, the exploration of denoised smoothing for other $\ell_p$ threat models presents another intriguing avenue, as the authors briefly mention potential extensions to threat models like $\ell_1$ .

In conclusion, "Denoised Smoothing: A Provable Defense for Pretrained Classifiers" presents a remarkable advance in adversarial robustness, providing an accessible, effective paradigm for reinforcing the security of deep learning models against adversarial perturbations. This work lays a solid groundwork for future explorations in making AI systems more reliable and secure in adversarial environments.

Markdown Report Issue