Certified Adversarial Robustness via Randomized Smoothing (1902.02918v2)

Published 8 Feb 2019 in cs.LG and stat.ML

Abstract: We show how to turn any classifier that classifies well under Gaussian noise into a new classifier that is certifiably robust to adversarial perturbations under the $\ell_2$ norm. This "randomized smoothing" technique has been proposed recently in the literature, but existing guarantees are loose. We prove a tight robustness guarantee in $\ell_2$ norm for smoothing with Gaussian noise. We use randomized smoothing to obtain an ImageNet classifier with e.g. a certified top-1 accuracy of 49% under adversarial perturbations with $\ell_2$ norm less than 0.5 (=127/255). No certified defense has been shown feasible on ImageNet except for smoothing. On smaller-scale datasets where competing approaches to certified $\ell_2$ robustness are viable, smoothing delivers higher certified accuracies. Our strong empirical results suggest that randomized smoothing is a promising direction for future research into adversarially robust classification. Code and models are available at http://github.com/locuslab/smoothing.

Citations (1,875)

View on Semantic Scholar

Summary

The paper provides a tight theoretical guarantee for adversarial robustness using Gaussian randomized smoothing under the ℓ2 norm.
The approach achieves impressive empirical performance, with certified accuracies reaching 49% on ImageNet for small ℓ2 perturbations.
The work includes efficient Monte Carlo algorithms, enabling practical deployment of certifiably robust classifiers in real-world applications.

Certified Adversarial Robustness via Randomized Smoothing

The paper "Certified Adversarial Robustness via Randomized Smoothing" by Jeremy Cohen, Elan Rosenfeld, and J. Zico Kolter proposes a novel approach to enhance the adversarial robustness of classifiers using a technique called randomized smoothing. Their main contribution is providing a tight theoretical guarantee for the robustness of smoothed classifiers against adversarial perturbations in the $\ell_2$ norm. This essay will summarize the paper, highlight its strong numerical results, and discuss its implications for both theory and practice in the field of adversarial machine learning.

Overview of Randomized Smoothing

Randomized smoothing transforms a base classifier into a smoothed classifier, which is certifiably robust to adversarial perturbations. The transformation involves adding Gaussian noise to the input before classification. Formally, given a base classifier $f$ , the smoothed classifier $g$ predicts the most probable class under Gaussian perturbations of the input.

Main Contributions

Tight Robustness Guarantee: The authors derive a tight robustness guarantee for the smoothed classifier under the $\ell_2$ norm. If the base classifier's prediction under Gaussian noise satisfies certain probability conditions, the smoothed classifier's prediction remains invariant within a specific $\ell_2$ radius around the input. Mathematically, they prove that if $\underline{p_A}$ and $\overline{p_B}$ are lower and upper bounds on class probabilities, the certified radius $R$ is:

$R = \frac{\sigma}{2} (\Phi^{-1}(\underline{p_A}) - \Phi^{-1}(\overline{p_B}))$

where $\Phi^{-1}$ is the inverse standard Gaussian cumulative distribution function.

Empirical Results on ImageNet: A notable empirical achievement of the paper is the application of randomized smoothing to ImageNet, where the smoothed classifier achieved a certified top-1 accuracy of 49% under adversarial perturbations with an $\ell_2$ norm less than roughly 0.5. This result is significant as previously no certified defense had shown feasibility on ImageNet.
Comparison with Existing Approaches: The experimental results indicate that randomized smoothing outperforms previous methods in both smaller-scale datasets (like CIFAR-10) and larger ones (like ImageNet), often delivering higher certified accuracies.
Practical Algorithms: The authors present Monte Carlo algorithms for both predicting with the smoothed classifier and certifying its robustness. These algorithms are essential for practical deployment, considering the underlying computational challenges in exact calculations.

Numerical Results and Bold Claims

The authors offer strong numerical evidence to support their claims. For instance, on ImageNet, they report the following certified accuracies for different radii $r$ :

At $r = 0.5$ , the certified accuracy is 49% for $\sigma = 0.25$ .
At $r = 1.0$ , the certified accuracy is 37% for $\sigma = 0.50$ .

The empirical improvement over baseline models is substantial, particularly on high-dimensional datasets where other certified defenses struggle to scale.

Implications and Future Directions

The implications of this work are multi-fold:

Practical Defense Mechanism:

Randomized smoothing, with its robustness guarantees, provides a practical and scalable defense mechanism against adversarial attacks. This is especially crucial for real-world applications like autonomous driving and medical imaging, where robustness is critical.

Influence on Adversarial Training:

The tight robustness guarantee and the simplicity of the randomized smoothing method could influence future adversarial training procedures, potentially integrating smoothing techniques to enhance robustness further.

Broad Applicability:

Given its model-agnostic nature, randomized smoothing can be applied across various neural architectures, making it a versatile tool in the adversarial machine learning toolkit.

Theoretical advances in using different noise distributions beyond Gaussian to induce robustness against other types of perturbations (e.g., $\ell_p$ norms) could be a future research direction. Moreover, improving the efficiency of the sampling algorithms could enhance the practical performance of the smoothed classifiers.

Conclusion

The paper by Cohen, Rosenfeld, and Kolter presents a landmark contribution to the field of adversarial robustness. The introduction of a tightly guaranteed, empirically validated method for obtaining certifiably robust classifiers extends the frontier of how we can protect machine learning models from adversarial attacks. The implications for practical deployments and further theoretical exploration are profound, underlining the importance of their contributions.

For researchers and practitioners alike, the insights and methodologies from this paper offer a robust foundation for advancing the state of adversarially robust machine learning.

PDF Markdown

Related Papers

GitHub

GitHub - locuslab/smoothing: Provable adversarial robustness at ImageNet scale (390 stars)

YouTube

Show All Videos