On Evaluating Adversarial Robustness (1902.06705v2)

Published 18 Feb 2019 in cs.LG, cs.CR, and stat.ML

Abstract: Correctly evaluating defenses against adversarial examples has proven to be extremely difficult. Despite the significant amount of recent work attempting to design defenses that withstand adaptive attacks, few have succeeded; most papers that propose defenses are quickly shown to be incorrect. We believe a large contributing factor is the difficulty of performing security evaluations. In this paper, we discuss the methodological foundations, review commonly accepted best practices, and suggest new methods for evaluating defenses to adversarial examples. We hope that both researchers developing defenses as well as readers and reviewers who wish to understand the completeness of an evaluation consider our advice in order to avoid common pitfalls.

Citations (840)

View on Semantic Scholar

Summary

The paper presents rigorous evaluation practices that emphasize adaptive attacks and reproducibility to accurately assess adversarial defenses.
It stresses the importance of a clearly defined threat model and diverse attack strategies to avoid common evaluation pitfalls.
Transparent reporting of hyperparameters and performance metrics advances the development of secure and reliable machine learning systems.

On Evaluating Adversarial Robustness: A Summary

The paper "On Evaluating Adversarial Robustness," authored by Nicholas Carlini et al., presents a comprehensive guide on the evaluation practices related to defenses against adversarial examples in machine learning. The document addresses the methodological challenges and recurring pitfalls that hinder accurate assessment of adversarial robustness. This discussion is particularly relevant given the rapid advancements and broadening applications of machine learning, which necessitate reliable defenses against adversarial attacks.

Adversarial examples are maliciously crafted inputs designed to deceive machine learning models into making incorrect predictions. The robustness of a machine learning system to such examples is a critical security concern. Despite numerous proposed defenses, many are soon found to be ineffective when subjected to rigorous and adaptive attacks.

Principles of Defense Evaluation

The paper underscores the importance of establishing a precise threat model. A threat model defines the adversary’s goals, knowledge, and capabilities, guiding the evaluation process. Key aspects include:

Adversary Goals: The nature of the misclassification the adversary aims to achieve.
Adversary Capabilities: Restrictions on the adversary's perturbations, often quantified using norm constraints such as $\ell_p$ -norm.
Adversary Knowledge: The assumption that the attacker knows the defense mechanism, reflecting Kerckhoffs' principle from cryptography.

Methodological Recommendations

The paper emphasizes adaptive adversaries who are aware of the defense and modify their attack strategies accordingly. Evaluators should:

Perform Adaptive Attacks: Use defined threat models and apply adaptive attacks to test the robustness claims. Non-adaptive attacks are insufficient.
Release Source Code and Models: Ensure the reproducibility of results by sharing the source code and pre-trained models.
Report Clean Accuracy: Evaluate the model’s performance on clean data to understand the trade-off between robustness and accuracy.

Evaluation Checklist

A detailed checklist is provided to avoid common evaluation flaws. Key items include:

State the threat model explicitly: Ensure clear communication of adversarial assumptions.
Verify attack performance: Iterative attacks should outperform single-step ones, and increasing perturbation should increase attack success.
Conduct diverse attacks: Use gradient-based, gradient-free, and transfer-based attacks to ensure comprehensive evaluations.
Report detailed results: Include hyperparameters, attack settings, and per-example success rates for transparency.

Implications and Future Developments

The paper's recommendations aim to standardize evaluation practices and foster objectivity in adversarial robustness research. By emphasizing rigorous and reproducible methods, the authors contribute to building more resilient machine learning systems. As robust evaluations become more intricate, future work may investigate provable defenses, extend testing to various application domains beyond images, and explore adaptive strategies in adversarial training.

In conclusion, Carlini et al.'s paper addresses the urgent need for methodological rigor in evaluating adversarial defenses. By offering practical guidelines, it aims to improve the consistency and reliability of adversarial robustness assessments, ultimately contributing to more secure and trustworthy AI systems.