Adversarially Robust Distillation

Published 23 May 2019 in cs.LG, cs.CV, and stat.ML | (1905.09747v2)

Abstract: Knowledge distillation is effective for producing small, high-performance neural networks for classification, but these small networks are vulnerable to adversarial attacks. This paper studies how adversarial robustness transfers from teacher to student during knowledge distillation. We find that a large amount of robustness may be inherited by the student even when distilled on only clean images. Second, we introduce Adversarially Robust Distillation (ARD) for distilling robustness onto student networks. In addition to producing small models with high test accuracy like conventional distillation, ARD also passes the superior robustness of large networks onto the student. In our experiments, we find that ARD student models decisively outperform adversarially trained networks of identical architecture in terms of robust accuracy, surpassing state-of-the-art methods on standard robustness benchmarks. Finally, we adapt recent fast adversarial training methods to ARD for accelerated robust distillation.

Abstract PDF Upgrade to Chat

Citations (182)

View on Semantic Scholar

Summary

The paper demonstrates that ARD techniques reduce multiply-add operations significantly while maintaining competitive robustness using compact student models.
The analysis reveals that lowering the α parameter rapidly degrades robustness, whereas temperature changes have minimal impact, highlighting key accuracy-robustness trade-offs.
Experiments show that strategic data augmentation and natural teacher training can enhance efficiency, emphasizing ARD’s role in balancing computational cost with performance.

An Evaluation of Adversarially Robust Distillation (ARD) Techniques

This paper provides a comprehensive evaluation of Adversarially Robust Distillation (ARD) techniques, primarily focusing on the efficiency of student-teacher models and the implications of various hyperparameters on the performance trade-offs involving natural and robust accuracies. The authors scrutinize the ARD process utilizing different neural network architectures as teacher-student pairs, such as the WideResNet and ResNet18 teacher models and the MobileNetV2 student model.

In detail, the research critically assesses the space and time efficiency of these deep learning models. The ResNet18 and WideResNet include approximately $11.2$ million and $46.2$ million parameters, respectively, while the MobileNetV2 student model is significantly more compact with $2.3$ million parameters. Performance efficiency was measured using multiply-add (MAdd) operations, where a notable finding was the substantial decrease to $1.4\%$ of operations during a forward pass in comparison to the WideResNet.

The paper deeply investigates the effects of temperature and $\alpha$ parameters on knowledge distillation outcomes with respect to robustness trade-offs. The study reveals that while the temperature parameter has minimal effect on robustness, the reduction in $\alpha$ leads to a rapid decline in robustness, indicating an accuracy-robustness tradeoff particularly impactful for low $\alpha$ values. Different approaches were considered for data augmentation, with findings suggesting that basic augmentation strategies such as horizontal flips and random cropping were effective for robustness without incurring the drawbacks associated with adversarial point teacher behavior training.

Multiple experimental tables included demonstrate the nuanced impacts of hyperparameters adjustment, such as temperature and $\alpha$ , on the performance of knowledge distillation. Additional analysis evaluates other ARD technical configurations. For instance, including a knowledge distillation $\KL$ divergence term degraded robust accuracy, and substituting $T^t(\mathbf{x}_i')$ for $T^t(\mathbf{x}_i)$ in the loss function was found detrimental to overall accuracy.

Moreover, naturally trained teacher models under ARD conditions still produced students with some robustness, albeit lower than those distilled from adversarially trained counterparts. Further, the acceleration of ARD training was explored by lessening the number of attack steps, which enhanced natural accuracy but slightly dropped robustness, supporting the consideration of strategic trade-offs in ARD deployments.

Practically and theoretically, these findings contribute to the ongoing discourse in adversarial training by revealing the intricacies of ARD's hyperparameter sensitivity and operational efficiency. The implications of this work suggest potential refinements in robust model training protocols and highlight the balance between computational cost and accuracy in practice. Future directions might involve adaptive data augmentation tailored for robustness, enhancing the applicability of ARD strategies in real-world scenarios.

Markdown Report Issue