Semidefinite relaxations for certifying robustness to adversarial examples (1811.01057v1)

Published 2 Nov 2018 in cs.LG, cs.CR, and stat.ML

Abstract: Despite their impressive performance on diverse tasks, neural networks fail catastrophically in the presence of adversarial inputs---imperceptibly but adversarially perturbed versions of natural inputs. We have witnessed an arms race between defenders who attempt to train robust networks and attackers who try to construct adversarial examples. One promise of ending the arms race is developing certified defenses, ones which are provably robust against all attackers in some family. These certified defenses are based on convex relaxations which construct an upper bound on the worst case loss over all attackers in the family. Previous relaxations are loose on networks that are not trained against the respective relaxation. In this paper, we propose a new semidefinite relaxation for certifying robustness that applies to arbitrary ReLU networks. We show that our proposed relaxation is tighter than previous relaxations and produces meaningful robustness guarantees on three different "foreign networks" whose training objectives are agnostic to our proposed relaxation.

Citations (419)

View on Semantic Scholar

Summary

The paper introduces an SDP-based certification method that provides tighter robustness bounds for arbitrary multi-layer ReLU networks against adversarial attacks.
It outperforms LP-based approaches by reducing non-certified cases and achieving superior error guarantees on datasets like MNIST.
The results offer practical tools for enhancing network defenses, even on models not originally trained with certification in mind.

Semidefinite Relaxations for Certifying Robustness to Adversarial Examples

The paper presents a novel approach employing semidefinite programming (SDP) to certify the robustness of neural networks against adversarial examples, specifically focusing on multi-layer ReLU networks. The paper emphasizes the importance of certified defenses in neural networks as a response to the ongoing adversarial arms race, where attackers continually find ways to construct adversarial examples that corrupt neural network predictions. This approach distinguishes itself by offering a tighter relaxation compared to previous convex relaxations, particularly those based on linear programming (LP).

Key Contributions

Introduction of SDP for Robustness Certification: The paper introduces an SDP-based certification method that can be applied to arbitrary ReLU networks. The proposed SDP relaxation bounds the worst-case loss across legitimate attacks by constructing and optimizing a semidefinite program. This is shown to offer a tighter relaxation than existing LP-based methods, providing better guarantees on the robustness of networks.
Evaluation on Foreign Networks: One significant finding is that the SDP-based relaxation provides meaningful robustness guarantees even for networks that were not trained with this certification method in mind. Notably, when applied to networks trained without this specific certification process, SDP still provides tight certificates, outperforming methods designed around LP relaxations.
Comparison with Other Certifications: The SDP approach, on empirical evaluation, demonstrates superiority over earlier certification methods such as LP-based certifications and gradient norm bounds. The paper includes rigorous comparisons across networks trained through different methodologies (e.g., adversarial training, gradient-based regularization), showing SDP-cert consistently yields fewer non-certified cases against adversarial attacks.

Numerical Results and Analysis

The SDP method delivers substantial improvements over previous approaches, successfully certifying models on challenging datasets like MNIST with a tighter gap between adversarial error estimates and certification bounds. For instance, on a four-layer fully connected network trained with adversarial methods, the SDP certification reduced the certifiable error to 18% at an $\epsilon = 0.1$ perturbation level, compared to a lower bound of 9% established by PGD attacks.

The SDP protocol effectively navigates the challenges of scaling by implementing techniques from geometric interpretations of the problem setup, which are crucial for understanding the SDP relaxation's performance mathematics compared to LP relaxations.

Theoretical and Practical Implications

Theoretically, the introduction of SDP relaxations adds a deeper level of insight into certification methods, highlighting interactions that LP relaxations fail to capture. An important assertion proved is a dimension-dependent gap, showcasing that SDP can provide significantly tighter bounds under certain network configurations.

Practically, this work provides tools to enhance the robustness of network deployments in real-world applications, where adversarial effects might otherwise significantly degrade performance. This holistic contribution extends the capability to certified robust learning paradigms, ensuring neural networks' efficacy even in adversarial settings.

Future Directions

Looking forward, the paper identifies open questions, primarily concentrating on extending the empirical success of their approach to even larger architectures, such as convolutional neural networks (CNNs). Adapting this work to CNNs could realize broader applications, given the frequency of CNNs in practical AI solutions. Additionally, the exploration of scalability solutions via specialized SDP solvers tailored for these kinds of problems could substantially enhance the real-world applicability of robustness certifications.

In conclusion, the paper successfully broadens the landscape of certified defenses against adversarial inputs by leveraging SDP, contributing a noteworthy step toward addressing the vulnerabilities in neural network deployments facing adversarial challenges.

PDF Markdown