Training verified learners with learned verifiers (1805.10265v2)

Published 25 May 2018 in cs.LG and stat.ML

Abstract: This paper proposes a new algorithmic framework, predictor-verifier training, to train neural networks that are verifiable, i.e., networks that provably satisfy some desired input-output properties. The key idea is to simultaneously train two networks: a predictor network that performs the task at hand,e.g., predicting labels given inputs, and a verifier network that computes a bound on how well the predictor satisfies the properties being verified. Both networks can be trained simultaneously to optimize a weighted combination of the standard data-fitting loss and a term that bounds the maximum violation of the property. Experiments show that not only is the predictor-verifier architecture able to train networks to achieve state of the art verified robustness to adversarial examples with much shorter training times (outperforming previous algorithms on small datasets like MNIST and SVHN), but it can also be scaled to produce the first known (to the best of our knowledge) verifiably robust networks for CIFAR-10.

Citations (167)

View on Semantic Scholar

Summary

The paper introduces Predictor-Verifier Training (PVT), a novel dual-network framework that concurrently trains a neural network (predictor) and a verifier to produce models with provable robustness against adversarial perturbations.
Empirical validation on MNIST, SVHN, and CIFAR-10 datasets demonstrates that PVT achieves state-of-the-art verified accuracy with significantly reduced training times compared to prior verification methods, including the first nontrivial verified bounds on CIFAR-10.
This work offers a scalable and integrated approach to training verifiable machine learning models, laying foundational work for developing more secure and reliable AI systems suitable for critical applications by addressing computational overheads in verification.

Training Verified Learners with Learned Verifiers

The paper "Training Verified Learners with Learned Verifiers" introduces a novel framework for training neural networks that not only fit data effectively but also provide verifiable guarantees on robustness to adversarial perturbations. The authors present a method termed predictor-verifier training (PVT), a dual-network approach comprised of a predictor tasked with classification and a verifier that bounds the predictor's adherence to specified input-output properties. This methodology aims to produce networks with provable robustness, addressing limitations observed in previous verification strategies.

Summary of Contributions

The primary contribution of this work is the integration of the verification process into the training of neural networks, resulting in models that uphold specified robustness properties against adversarial examples. The research demonstrates that it is feasible to concurrently train a predictor network alongside a verifier network, with the training objective being a combination of typical classification loss and a dual optimization loss from the verifier. This interplay allows the algorithm to amortize the computational cost of verification across training examples, thus enabling the scaling of this approach to more complex models beyond small conventional datasets.

Empirical Validation

The empirical experimentation section provides performance comparisons across benchmark datasets MNIST, SVHN, and CIFAR-10. The proposed PVT framework achieves state-of-the-art verified accuracy to adversarial perturbations, not only reducing training times significantly compared to prior methods but also maintaining competitive accuracy against adversarial attacks.

MNIST and SVHN Performance: PVT shows superior results over existing methods, including a previous technique by Kolter and Wong, by providing tighter verified bounds on adversarial error and improved accuracy, verifying robustness in considerably reduced training times.
CIFAR-10 Development: This paper notes a watershed achievement by producing what is claimed to be the first nontrivial verified accuracy bounds on adversarial robustness for CIFAR-10 datasets, showcasing the extensibility of the method to more complex image data.

Implications and Future Directions

This paper puts forth crucial advancements in certified defenses against adversarial attacks, consolidating training and verification into a cohesive process. The PVT framework shows promise for broader applications in real-world tasks requiring strong reliability guarantees. Future explorations could expand upon this work by applying the framework to larger datasets such as ImageNet or designing architectures for verifiable machine learning models with complex specifications in diverse application domains.

In conclusion, by introducing a robust machine learning training paradigm that seamlessly incorporates adversarial verification into the model development pipeline, this paper lays foundational work for more secure and reliable AI systems. The method addresses notable drawbacks of computational overhead in verification processes and enables scalable solutions capable of tackling the intricacies of adversarial robustness in practical implementations. As such, it represents an important step toward the development of models that can be reliably entrusted in critical applications susceptible to adversarial attacks.