Rethinking Invariance Regularization in Adversarial Training to Improve Robustness-Accuracy Trade-off (2402.14648v3)
Abstract: Adversarial training often suffers from a robustness-accuracy trade-off, where achieving high robustness comes at the cost of accuracy. One approach to mitigate this trade-off is leveraging invariance regularization, which encourages model invariance under adversarial perturbations; however, it still leads to accuracy loss. In this work, we closely analyze the challenges of using invariance regularization in adversarial training and understand how to address them. Our analysis identifies two key issues: (1) a ``gradient conflict" between invariance and classification objectives, leading to suboptimal convergence, and (2) the mixture distribution problem arising from diverged distributions between clean and adversarial inputs. To address these issues, we propose Asymmetric Representation-regularized Adversarial Training (ARAT), which incorporates asymmetric invariance loss with stop-gradient operation and a predictor to avoid gradient conflict, and a split-BatchNorm (BN) structure to resolve the mixture distribution problem. Our detailed analysis demonstrates that each component effectively addresses the identified issues, offering novel insights into adversarial defense. ARAT shows superiority over existing methods across various settings. Finally, we discuss the implications of our findings to knowledge distillation-based defenses, providing a new perspective on their relative successes.
- Exploring simple siamese representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 15750–15758, 2021.
- Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. In International conference on machine learning, pp. 2206–2216. PMLR, 2020.
- Learnable boundary guided adversarial training. In Proceedings of the IEEE/CVF international conference on computer vision, pp. 15721–15730, 2021.
- Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pp. 248–255. Ieee, 2009.
- Explaining and harnessing adversarial examples. In ICLR, 2015.
- Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems, 33:21271–21284, 2020.
- Deep residual learning for image recognition. In Proceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition, pp. 770–778, 2016.
- Distilling the knowledge in a neural network. Workshop on Advances in neural information processing systems, 2014.
- Howard, J. A smaller subset of 10 easily classified classes from imagenet, and a little more french. https://github.com/fastai/imagenette, 2019.
- Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning, pp. 448–456. pmlr, 2015.
- Enhancing adversarial training with second-order statistics of weights. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15273–15283, 2022.
- Learning multiple layers of features from tiny images. Technical report, University of Toronto, 2009.
- Towards deep learning models resistant to adversarial attacks. In ICLR, 2018.
- Domain generalization via gradient surgery. In Proceedings of the IEEE/CVF international conference on computer vision, pp. 6630–6638, 2021.
- Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426, 2018.
- Adversarial finetuning with latent representation constraint to mitigate accuracy-robustness tradeoff. In 2023 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 4367–4378. IEEE, 2023.
- Intriguing properties of neural networks. In ICLR, 2014.
- Robustness may be at odds with accuracy. In ICLR, 2019.
- Improving adversarial robustness requires revisiting misclassified examples. In International conference on learning representations, 2019.
- The mechanism of prediction head in non-contrastive self-supervised learning. Advances in Neural Information Processing Systems, 35:24794–24809, 2022.
- Adversarial weight perturbation helps robust generalization. Advances in Neural Information Processing Systems, 33:2958–2969, 2020.
- Intriguing properties of adversarial training at scale. arXiv preprint arXiv:1906.03787, 2019.
- Adversarial examples improve image recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 819–828, 2020.
- Autolora: A parameter-free automated robust fine-tuning framework. arXiv preprint arXiv:2310.01818, 2023.
- Gradient surgery for multi-task learning. Advances in Neural Information Processing Systems, 33:5824–5836, 2020.
- Wide residual networks. arXiv preprint arXiv:1605.07146, 2016.
- How does simsiam avoid collapse without negative samples? a unified understanding with self-supervised contrastive learning. International conference on learning representations, 2022.
- Theoretically principled trade-off between robustness and accuracy. In International conference on machine learning, pp. 7472–7482. PMLR, 2019.
- Attacks which do not kill training make adversarial learning stronger. In International conference on machine learning, pp. 11278–11287. PMLR, 2020.
- Reliable adversarial distillation with unreliable teachers. arXiv preprint arXiv:2106.04928, 2021.
- Towards a unified theoretical understanding of non-contrastive learning via rank differential mechanism. International conference on learning representations, 2023.