Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Is Certifying $\ell_p$ Robustness Still Worthwhile? (2310.09361v1)

Published 13 Oct 2023 in cs.LG

Abstract: Over the years, researchers have developed myriad attacks that exploit the ubiquity of adversarial examples, as well as defenses that aim to guard against the security vulnerabilities posed by such attacks. Of particular interest to this paper are defenses that provide provable guarantees against the class of $\ell_p$-bounded attacks. Certified defenses have made significant progress, taking robustness certification from toy models and datasets to large-scale problems like ImageNet classification. While this is undoubtedly an interesting academic problem, as the field has matured, its impact in practice remains unclear, thus we find it useful to revisit the motivation for continuing this line of research. There are three layers to this inquiry, which we address in this paper: (1) why do we care about robustness research? (2) why do we care about the $\ell_p$-bounded threat model? And (3) why do we care about certification as opposed to empirical defenses? In brief, we take the position that local robustness certification indeed confers practical value to the field of machine learning. We focus especially on the latter two questions from above. With respect to the first of the two, we argue that the $\ell_p$-bounded threat model acts as a minimal requirement for safe application of models in security-critical domains, while at the same time, evidence has mounted suggesting that local robustness may lead to downstream external benefits not immediately related to robustness. As for the second, we argue that (i) certification provides a resolution to the cat-and-mouse game of adversarial attacks; and furthermore, that (ii) perhaps contrary to popular belief, there may not exist a fundamental trade-off between accuracy, robustness, and certifiability, while moreover, certified training techniques constitute a particularly promising way for learning robust models.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (118)
  1. https://nicholas.carlini.com/writing/2019/all-adversarial-example-papers.html.
  2. Raising the bar for certified adversarial robustness with diffusion models. arXiv preprint arXiv:2305.10388, 2023.
  3. Square attack: a query-efficient black-box adversarial attack via random search. In European conference on computer vision, pp.  484–501. Springer, 2020.
  4. A unified algebraic perspective on lipschitz neural networks. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=k71IGLC8cfc.
  5. The second international verification of neural networks competition (vnn-comp 2021): Summary and results. arXiv preprint arXiv:2109.00498, 2021.
  6. Certifying geometric robustness of neural networks. In Advances in Neural Information Processing Systems 32. 2019.
  7. Invertible residual networks. In International conference on machine learning, pp.  573–582. PMLR, 2019.
  8. Certifiably adversarially robust detection of out-of-distribution data. In Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., and Lin, H. (eds.), Advances in Neural Information Processing Systems, volume 33, pp.  16085–16095. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper_files/paper/2020/file/b90c46963248e6d7aab1e0f429743ca0-Paper.pdf.
  9. First three years of the international verification of neural networks competition (vnn-comp). International Journal on Software Tools for Technology Transfer, pp.  1–11, 2023.
  10. A law of robustness for two-layers neural networks. In Conference on Learning Theory, pp.  804–820. PMLR, 2021.
  11. Branch and bound for piecewise linear neural network verification. Journal of Machine Learning Research, 21(2020), 2020.
  12. Discrete-event controller synthesis for autonomous systems with deep-learning perception components. arXiv preprint arXiv:2202.03360, 2022.
  13. Towards evaluating the robustness of neural networks. In 2017 ieee symposium on security and privacy (sp), pp.  39–57. Ieee, 2017.
  14. On evaluating adversarial robustness. arXiv preprint arXiv:1902.06705, 2019.
  15. (certified!!) adversarial robustness for free! arXiv preprint arXiv:2206.10550, 2022.
  16. Are aligned neural networks adversarially aligned? arXiv preprint arXiv:2306.15447, 2023.
  17. Adversarial robustness: From self-supervised pre-training to fine-tuning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  699–708, 2020.
  18. Certified adversarial robustness via randomized smoothing. In international conference on machine learning, pp.  1310–1320. PMLR, 2019.
  19. Minimally distorted adversarial examples with a fast adaptive boundary attack. In International Conference on Machine Learning, pp.  2196–2205. PMLR, 2020a.
  20. Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. In International conference on machine learning, pp.  2206–2216. PMLR, 2020b.
  21. Robustbench: a standardized adversarial robustness benchmark. arXiv preprint arXiv:2010.09670, 2020.
  22. Evaluating the adversarial robustness of adaptive test-time defenses. In International Conference on Machine Learning, pp.  4421–4435. PMLR, 2022.
  23. Enabling certification of verification-agnostic networks via memory-efficient semidefinite programming. Advances in Neural Information Processing Systems, 33:5318–5331, 2020.
  24. Adversarial training helps transfer learning via better representations. Advances in Neural Information Processing Systems, 34:25179–25191, 2021.
  25. Robustness of rotation-equivariant networks to adversarial perturbations. arXiv preprint arXiv:1802.06627, 2018.
  26. A dual approach to scalable verification of deep networks. In UAI, volume 1, pp.  3, 2018.
  27. On the connection between adversarial robustness and saliency map interpretability. In International Conference on Machine Learning, 2019. URL https://api.semanticscholar.org/CorpusID:150374025.
  28. Robust physical-world attacks on deep learning visual classification. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  1625–1634, 2018. doi: 10.1109/CVPR.2018.00175.
  29. Analysis of classifiers’ robustness to adversarial perturbations. Machine learning, 107(3):481–508, 2018.
  30. Safety verification and robustness analysis of neural networks via quadratic constraints and semidefinite programming. IEEE Transactions on Automatic Control, 67(1):1–15, 2020.
  31. The double-edged sword of implicit bias: Generalization vs. robustness in relu networks. In Advances in Neural Information Processing Systems, 2023.
  32. The best defense is a good offense: Adversarial augmentation against adversarial attacks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  4067–4076, 2023.
  33. An extended study of human-like behavior under adversarial training. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp.  2361–2368, 2023. URL https://api.semanticscholar.org/CorpusID:257663581.
  34. Motivating the rules of the game for adversarial example research. arXiv preprint arXiv:1807.06732, 2018.
  35. Explaining and harnessing adversarial examples. In International Conference on Learning Representations, 2015. URL http://arxiv.org/abs/1412.6572.
  36. On the effectiveness of interval bound propagation for training verifiably robust models. arXiv preprint arXiv:1810.12715, 2018a.
  37. On the effectiveness of interval bound propagation for training verifiably robust models. ArXiv, abs/1810.12715, 2018b. URL https://api.semanticscholar.org/CorpusID:53112003.
  38. Gsmooth: Certified robustness against semantic transformations via generalized randomized smoothing. In International Conference on Machine Learning, pp.  8465–8483. PMLR, 2022.
  39. AugMix: A simple data processing method to improve robustness and uncertainty. Proceedings of the International Conference on Learning Representations (ICLR), 2020.
  40. Aligning ai with shared human values. Proceedings of the International Conference on Learning Representations (ICLR), 2021a.
  41. Unsolved problems in ml safety. arXiv preprint arXiv:2109.13916, 2021b.
  42. An overview of catastrophic ai risks. arXiv preprint arXiv:2306.12001, 2023.
  43. Textgrad: Advancing robustness evaluation in NLP by gradient-driven optimization. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=5tKXUZil3X.
  44. Towards Compositional Adversarial Robustness: Generalizing Adversarial Training to Composite Semantic Perturbations. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR. IEEE, June 2023.
  45. A recipe for improved certifiable robustness: Capacity and data. arXiv preprint arXiv:2310.02513, 2023a.
  46. Scaling in depth: Unlocking robustness certification on imagenet. arXiv preprint arXiv:2301.12549, 2023b.
  47. {{\{{WaveGuard}}\}}: Understanding and mitigating audio adversarial examples. In 30th USENIX Security Symposium (USENIX Security 21), pp.  2273–2290, 2021.
  48. Adversarial examples are not bugs, they are features. Advances in neural information processing systems, 32, 2019.
  49. Smoothmix: Training confidence-calibrated smoothed classifiers for certified robustness. Advances in Neural Information Processing Systems, 34:30153–30168, 2021.
  50. Reluplex: An efficient smt solver for verifying deep neural networks. In Computer Aided Verification: 29th International Conference, CAV 2017, Heidelberg, Germany, July 24-28, 2017, Proceedings, Part I 30, pp.  97–117. Springer, 2017.
  51. The marabou framework for verification and analysis of deep neural networks. In Computer Aided Verification: 31st International Conference, CAV 2019, New York City, NY, USA, July 15-18, 2019, Proceedings, Part I 31, pp.  443–452. Springer, 2019.
  52. The lipschitz constant of self-attention. In International Conference on Machine Learning, pp.  5562–5571. PMLR, 2021.
  53. Large language models are zero-shot reasoners. In Advances in Neural Information Processing Systems, volume 35, pp.  22199–22213, 2022.
  54. Leino, K. Identifying, analyzing, and addressing weaknesses in deep networks: Foundations for conceptually sound neural networks, 2022.
  55. Leino, K. Limitations of piecewise linearity for efficient robustness certification, 2023.
  56. Globally-robust neural networks. In International Conference on Machine Learning, pp.  6212–6222. PMLR, 2021.
  57. Robust features can leak instances and their properties, 2022a.
  58. Degradation attacks on certifiably robust neural networks. Transactions on Machine Learning Research, 2022b. ISSN 2835-8856. URL https://openreview.net/forum?id=P0XO5ZE98j.
  59. Why clean generalization and robust overfitting both happen in adversarial training. arXiv preprint arXiv:2306.01271, 2023.
  60. Sok: Certified robustness for deep neural networks. In 2023 IEEE Symposium on Security and Privacy (SP), pp.  1289–1310. IEEE, 2023.
  61. Dual manifold adversarial robustness: Defense against lp and non-lp adversarial attacks. Advances in Neural Information Processing Systems, 33:3487–3498, 2020.
  62. Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=rJzIBfZAb.
  63. A cascade of checkers for run-time certification of local robustness. In Isac, O., Ivanov, R., Katz, G., Narodytska, N., and Nenzi, L. (eds.), Software Verification and Formal Methods for ML-Enabled Autonomous Systems - 5th International Workshop, FoMLAS 2022, and 15th International Workshop, NSV 2022, Haifa, Israel, July 31 - August 1, and August 11, 2022, Proceedings, volume 13466 of Lecture Notes in Computer Science, pp.  15–28. Springer, 2022. doi: 10.1007/978-3-031-21222-2_2. URL https://doi.org/10.1007/978-3-031-21222-2_2.
  64. Metric learning for adversarial robustness. In Wallach, H., Larochelle, H., Beygelzimer, A., d'Alché-Buc, F., Fox, E., and Garnett, R. (eds.), Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019. URL https://proceedings.neurips.cc/paper_files/paper/2019/file/c24cd76e1ce41366a4bbe8a49b02a028-Paper.pdf.
  65. Adversarial prompting for black box foundation models. arXiv preprint arXiv:2302.04237, 2023.
  66. Fast and stable interval bounds propagation for training verifiably robust models. arXiv preprint arXiv:1906.00628, 2019.
  67. A self-supervised approach for adversarial robustness. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  262–271, 2020.
  68. Exploring generalization in deep learning. In Guyon, I., Luxburg, U. V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., and Garnett, R. (eds.), Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017. URL https://proceedings.neurips.cc/paper_files/paper/2017/file/10ce03a1ed01077e3e289f3e53c72813-Paper.pdf.
  69. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744, 2022.
  70. Rethinking softmax cross-entropy loss for adversarial robustness. arXiv preprint arXiv:1905.10626, 2019a.
  71. Improving adversarial robustness via promoting ensemble diversity. In International Conference on Machine Learning, pp.  4970–4979. PMLR, 2019b.
  72. Closed-loop analysis of vision-based autonomous systems: A case study. In International Conference on Computer Aided Verification, pp.  289–303. Springer, 2023.
  73. An abstraction-refinement approach to verification of artificial neural networks. In Computer Aided Verification: 22nd International Conference, CAV 2010, Edinburgh, UK, July 15-19, 2010. Proceedings 22, pp.  243–257. Springer, 2010.
  74. Certified defenses against adversarial examples. In International Conference on Learning Representations, 2018.
  75. Fixing data augmentation to improve adversarial robustness. arXiv preprint arXiv:2103.01946, 2021.
  76. A convex relaxation barrier to tight robustness verification of neural networks. Advances in Neural Information Processing Systems, 32, 2019.
  77. Do adversarially robust imagenet models transfer better? Advances in Neural Information Processing Systems, 33:3533–3545, 2020.
  78. Adversarially robust generalization requires more data. Advances in neural information processing systems, 31, 2018.
  79. Adversarial unlearning: Reducing confidence along adversarial directions. In Oh, A. H., Agarwal, A., Belgrave, D., and Cho, K. (eds.), Advances in Neural Information Processing Systems, 2022. URL https://openreview.net/forum?id=cJ006qBE8Uv.
  80. Accessorize to a crime: Real and stealthy attacks on state-of-the-art face recognition. In Proceedings of the 23rd ACM SIGSAC Conference on Computer and Communications Security, October 2016. doi: 10.1145/2976749.2978392. URL https://www.ece.cmu.edu/~lbauer/papers/2016/ccs2016-face-recognition.pdf.
  81. Adversarial Generative Nets: Neural network attacks on state-of-the-art face recognition. arXiv preprint 1801.00349, December 2017. URL https://arxiv.org/abs/1801.00349.
  82. Formal verification for neural networks with general nonlinearities via branch-and-bound. 2nd Workshop on Formal Verification of Machine Learning (WFVML 2023), 2023.
  83. An abstract domain for certifying neural networks. Proceedings of the ACM on Programming Languages, 3(POPL):1–30, 2019.
  84. Skew orthogonal convolutions. In Meila, M. and Zhang, T. (eds.), Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pp.  9756–9766. PMLR, 18–24 Jul 2021.
  85. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013.
  86. Evaluating robustness of neural networks with mixed integer programming. arXiv preprint arXiv:1711.07356, 2017.
  87. Tramer, F. Detecting adversarial examples is (Nearly) as hard as classifying them. In Chaudhuri, K., Jegelka, S., Song, L., Szepesvari, C., Niu, G., and Sabato, S. (eds.), Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pp.  21692–21702. PMLR, 17–23 Jul 2022. URL https://proceedings.mlr.press/v162/tramer22a.html.
  88. Ensemble adversarial training: Attacks and defenses. arXiv preprint arXiv:1705.07204, 2017.
  89. Adversarial: Perceptual ad blocking meets adversarial machine learning. Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security, 2018. URL https://api.semanticscholar.org/CorpusID:67855958.
  90. Orthogonalizing convolutional layers with the cayley transform. In International Conference on Learning Representations (ICLR), 2021.
  91. Adversarial attack on sentiment classification. In Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pp.  233–240, Florence, Italy, August 2019. Association for Computational Linguistics. doi: 10.18653/v1/W19-4824. URL https://aclanthology.org/W19-4824.
  92. Robustness may be at odds with accuracy. In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=SyxAb30cY7.
  93. Adversarial risk and the dangers of evaluating against weak attacks. In Dy, J. and Krause, A. (eds.), Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pp.  5025–5034. PMLR, 10–15 Jul 2018. URL https://proceedings.mlr.press/v80/uesato18a.html.
  94. Gradient methods provably converge to non-robust networks. In Oh, A. H., Agarwal, A., Belgrave, D., and Cho, K. (eds.), Advances in Neural Information Processing Systems, 2022. URL https://openreview.net/forum?id=XDZhagjfMP.
  95. Art-point: Improving rotation robustness of point cloud classifiers via adversarial rotation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  14371–14380, 2022a.
  96. Efficient formal safety analysis of neural networks. Advances in neural information processing systems, 31, 2018.
  97. Beta-CROWN: Efficient bound propagation with per-neuron split constraints for neural network robustness verification. In Beygelzimer, A., Dauphin, Y., Liang, P., and Vaughan, J. W. (eds.), Advances in Neural Information Processing Systems, 2021. URL https://openreview.net/forum?id=ahYIlRBeCFw.
  98. Wang, Z. On the Feature Alignment of Deep Vision Models: Explainability and Robustness Connected At Hip. 9 2023. doi: 10.1184/R1/24026334.v1. URL https://kilthub.cmu.edu/articles/thesis/On_the_Feature_Alignment_of_Deep_Vision_Models_Explainability_and_Robustness_Connected_At_Hip/24026334.
  99. Robust models are more interpretable because attributions look normal. In Chaudhuri, K., Jegelka, S., Song, L., Szepesvari, C., Niu, G., and Sabato, S. (eds.), Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pp.  22625–22651. PMLR, 17–23 Jul 2022b. URL https://proceedings.mlr.press/v162/wang22e.html.
  100. Better diffusion models further improve adversarial training. In International Conference on Machine Learning (ICML), 2023.
  101. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483, 2023.
  102. Towards fast computation of certified robustness for relu networks. In International Conference on Machine Learning, pp.  5276–5285. PMLR, 2018.
  103. Scaling provable adversarial defenses. Advances in Neural Information Processing Systems, 31, 2018.
  104. Toward certified robustness against real-world distribution shifts. In 2023 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML), pp.  537–553. IEEE, 2023.
  105. Densepure: Understanding diffusion models for adversarial robustness. In The Eleventh International Conference on Learning Representations, 2022.
  106. Adversarial examples improve image recognition. arXiv preprint arXiv:1911.09665, 2020.
  107. Adversarial t-shirt! evading person detectors in a physical world. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp.  665–681. Springer, 2020.
  108. Certifiably robust transformers with 1-lipschitz self-attention, 2023. URL https://openreview.net/forum?id=hzG72qB0XQ.
  109. Randomized smoothing of all shapes and sizes. 2020a.
  110. Provable defense against geometric transformations. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=ThXqBsRI-cY.
  111. A closer look at accuracy vs. robustness. Advances in neural information processing systems, 33:8588–8601, 2020b.
  112. Efficient neural network robustness certification with general activation functions. Advances in neural information processing systems, 31, 2018.
  113. Theoretically principled trade-off between robustness and accuracy. In International conference on machine learning, pp.  7472–7482. PMLR, 2019a.
  114. Theoretically principled trade-off between robustness and accuracy. ArXiv, abs/1901.08573, 2019b. URL https://api.semanticscholar.org/CorpusID:59222747.
  115. Improving the speed and quality of gan by adversarial training. arXiv preprint arXiv:2008.03364, 2020.
  116. Enhancing adversarial robustness for deep metric learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  15325–15334, 2022.
  117. Fine-tuning language models from human preferences. arXiv preprint arXiv:1909.08593, 2019.
  118. Universal and transferable adversarial attacks on aligned language models. arXiv preprint arXiv:2307.15043, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Ravi Mangal (13 papers)
  2. Klas Leino (14 papers)
  3. Zifan Wang (75 papers)
  4. Kai Hu (55 papers)
  5. Weicheng Yu (3 papers)
  6. Corina Pasareanu (19 papers)
  7. Anupam Datta (51 papers)
  8. Matt Fredrikson (44 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.