Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks

Published 3 Mar 2020 in cs.LG, cs.CV, and stat.ML | (2003.01690v2)

Abstract: The field of defense strategies against adversarial attacks has significantly grown over the last years, but progress is hampered as the evaluation of adversarial defenses is often insufficient and thus gives a wrong impression of robustness. Many promising defenses could be broken later on, making it difficult to identify the state-of-the-art. Frequent pitfalls in the evaluation are improper tuning of hyperparameters of the attacks, gradient obfuscation or masking. In this paper we first propose two extensions of the PGD-attack overcoming failures due to suboptimal step size and problems of the objective function. We then combine our novel attacks with two complementary existing ones to form a parameter-free, computationally affordable and user-independent ensemble of attacks to test adversarial robustness. We apply our ensemble to over 50 models from papers published at recent top machine learning and computer vision venues. In all except one of the cases we achieve lower robust test accuracy than reported in these papers, often by more than $10\%$, identifying several broken defenses.

Abstract PDF Upgrade to Chat

Authors (2)

Citations (1,612)

View on Semantic Scholar

Summary

The paper introduces novel APGD extensions that remove manual step-size tuning to enhance gradient-based adversarial attacks.
The paper demonstrates that combining APGD with FAB and Square Attack effectively reduces reported robust accuracy by over 10% in many models.
The paper validates its ensemble on more than 50 models, uncovering significant robustness gaps with accuracy drops exceeding 30% in some cases.

Reliable Evaluation of Adversarial Robustness with an Ensemble of Diverse Parameter-free Attacks

The paper under review addresses a critical challenge in the evaluation of adversarial defenses—a domain of utmost importance for ensuring the robustness and safety of machine learning systems against adversarial attacks. Despite the numerous defenses proposed over the years, the evaluation methodologies often fall short, leading to misperceptions about the robustness of these defenses.

Key Contributions

The authors make several notable contributions to this domain:

Novel Extensions of PGD Attack: The paper introduces two significant extensions to the well-known PGD attack. These extensions address key weaknesses in the traditional PGD: a new gradient-based scheme termed Auto-PGD (APGD) that abandons the requirement for manually selecting a step size, and an alternative loss function specifically designed to overcome the limitations of the cross-entropy loss in adversarial contexts.
Ensemble of Attacks: By combining the newly proposed APGD with two existing attacks—FAB and Square Attack—the authors create a parameter-free, user-independent ensemble aimed at providing a more reliable evaluation of adversarial robustness. This ensemble does not require fine-tuning for each new defense, which is a significant advantage for standardized robustness testing.
Large-Scale Evaluation: The proposed ensemble is rigorously tested on over 50 models from various top-tier machine learning and computer vision conferences. The empirical results underscore the effectiveness of the approach, demonstrating that the ensemble can reliably reduce the reported robust accuracy by significant margins, often revealing previously undetected vulnerabilities.

Numerical Results and Claims

The empirical results are striking:

The ensemble of attacks achieves a robust accuracy lower than reported in the original papers in all but one of the over 50 evaluated models, highlighting its effectiveness.
The reduction in robust accuracy often exceeds 10%, with several cases showing a reduction of more than 30%, demonstrating significant shortcomings in existing evaluation protocols.

For example, prominent models evaluated in this study include WideResNet-28-10 from several papers where robust accuracy reductions of 2-3% are not uncommon, and in some cases like the model from [Wang and Zhang (2019, ICCV)], a reduction of approximately 39.25% is observed. This showcases the thoroughness and efficacy of the proposed attacks in uncovering brittleness in supposedly robust models.

Implications and Future Directions

Practical Implications

For practitioners, the ensemble of attacks proposed provides a robust, computationally efficient, and user-independent method for evaluating adversarial defenses. This method can be integrated into the standard evaluation pipeline for any new defense, ensuring that models are rigorously vetted against diverse and effective attacks. This addresses a significant gap in current practices, where often hyper-parameter tuning and specific attack configurations might inadvertently favor overestimation of robustness.

Theoretical Implications

On the theoretical side, the development of a parameter-free, gradient-based attack that adapts its step size dynamically (APGD) poses interesting questions for the optimization community. The success of the DLR loss in avoiding gradient masking observed with the cross-entropy loss suggests a promising avenue for developing even more sophisticated loss functions that maintain robustness across different scales and shifts.

Future Developments

In the field of future research, the combination of white-box and black-box attacks within a single ensemble framework is particularly compelling. Expanding this framework to other norms and considering additional advanced black-box attacks could further enhance the robustness testing protocols. Additionally, the exploration of such ensembles in other domains outside image classification, such as natural language processing, could yield valuable insights.

Furthermore, the intriguing results on randomized defenses emphasize the need for more sophisticated methods like APGD that can handle the stochastic components of models effectively. This signifies potential future research directions in crafting adaptive attacks that remain effective against randomly varying model outputs.

Conclusion

The paper offers a significant advancement in the evaluation of adversarial robustness by proposing an ensemble of diverse, parameter-free attacks that comprehensively test the defenses of deep learning models. These contributions are not only practical but also bring fresh perspectives to theoretical aspects of robustness evaluation. By highlighting the often-overlooked flaws in current assessment methodologies, this research paves the way for more rigorous and reliable robustness verification in machine learning systems.

Markdown Report Issue