Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
162 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Robustness-Congruent Adversarial Training for Secure Machine Learning Model Updates (2402.17390v2)

Published 27 Feb 2024 in cs.LG and cs.CR

Abstract: Machine-learning models demand periodic updates to improve their average accuracy, exploiting novel architectures and additional data. However, a newly updated model may commit mistakes the previous model did not make. Such misclassifications are referred to as negative flips, experienced by users as a regression of performance. In this work, we show that this problem also affects robustness to adversarial examples, hindering the development of secure model update practices. In particular, when updating a model to improve its adversarial robustness, previously ineffective adversarial attacks on some inputs may become successful, causing a regression in the perceived security of the system. We propose a novel technique, named robustness-congruent adversarial training, to address this issue. It amounts to fine-tuning a model with adversarial training, while constraining it to retain higher robustness on the samples for which no adversarial example was found before the update. We show that our algorithm and, more generally, learning with non-regression constraints, provides a theoretically-grounded framework to train consistent estimators. Our experiments on robust models for computer vision confirm that both accuracy and robustness, even if improved after model update, can be affected by negative flips, and our robustness-congruent adversarial training can mitigate the problem, outperforming competing baseline methods.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (36)
  1. S. Yan, Y. Xiong, K. Kundu, S. Yang, S. Deng, M. Wang, W. Xia, and S. Soatto, “Positive-congruent training: Towards regression-free model updates,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021.
  2. F. Croce, M. Andriushchenko, V. Sehwag, E. Debenedetti, N. Flammarion, M. Chiang, P. Mittal, and M. Hein, “Robustbench: a standardized adversarial robustness benchmark,” in Neural Information Processing Systems, 2021.
  3. A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, “Towards deep learning models resistant to adversarial attacks,” in Int’l Conference on Learning Representations, 2018.
  4. B. Biggio, I. Corona, D. Maiorca, B. Nelson, N. Šrndić, P. Laskov, G. Giacinto, and F. Roli, “Evasion attacks against machine learning at test time,” in European Conference on Machine Learning and Knowledge Discovery in Databases, 2013.
  5. I. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing adversarial examples,” in ICLR, 2015.
  6. N. Carlini, A. Athalye, N. Papernot, W. Brendel, J. Rauber, D. Tsipras, I. Goodfellow, A. Madry, and A. Kurakin, “On evaluating adversarial robustness,” ArXiv e-prints, vol. 1902.06705, 2019.
  7. M. Pintor, F. Roli, W. Brendel, and B. Biggio, “Fast minimum-norm adversarial attacks through adaptive norm constraints,” in Neural Information Processing Systems, 2021.
  8. F. Croce and M. Hein, “Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks,” in ICML, 2020.
  9. L. Oneto, S. Ridella, and D. Anguita, “The benefits of adversarial defense in generalization,” Neurocomputing, vol. 505, pp. 125–141, 2022.
  10. N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: a simple way to prevent neural networks from overfitting,” The journal of machine learning research, vol. 15, no. 1, pp. 1929–1958, 2014.
  11. P. L. Bartlett and S. Mendelson, “Rademacher and gaussian complexities: Risk bounds and structural results,” Journal of Machine Learning Research, vol. 3, pp. 463–482, 2002.
  12. D. Yin, R. Kannan, and P. Bartlett, “Rademacher complexity for adversarially robust generalization,” in Int’l conference on machine learning, 2019.
  13. R. Bassily, K. Nissim, A. Smith, T. Steinke, U. Stemmer, and J. Ullman, “Algorithmic stability for adaptive data analysis,” in ACM symposium on Theory of Computing, 2016.
  14. D. Russo and J. Zou, “Controlling bias in adaptive data analysis using information theory,” in Int’l Conference on Artificial Intelligence and Statistics, 2016.
  15. L. Oneto, S. Ridella, and D. Anguita, “Tikhonov, ivanov and morozov regularization for support vector machine learning,” Machine Learning, vol. 103, pp. 103–136, 2016.
  16. L. Engstrom, A. Ilyas, H. Salman, S. Santurkar, and D. Tsipras, “Robustness (python library),” 2019. [Online]. Available: https://github.com/MadryLab/robustness
  17. J. Zhang, X. Xu, B. Han, G. Niu, L. Cui, M. Sugiyama, and M. Kankanhalli, “Attacks which do not kill training make adversarial learning stronger,” in International conference on machine learning.   PMLR, 2020, pp. 11 278–11 287.
  18. L. Rice, E. Wong, and Z. Kolter, “Overfitting in adversarially robust deep learning,” in International Conference on Machine Learning.   PMLR, 2020, pp. 8093–8104.
  19. R. Rade and S.-M. Moosavi-Dezfooli, “Helper-based adversarial training: Reducing excessive margin to achieve a better accuracy vs. robustness trade-off,” in ICML Workshop on Adversarial Machine Learning, 2021.
  20. D. Hendrycks, K. Lee, and M. Mazeika, “Using pre-training can improve model robustness and uncertainty,” in 36th ICML, ser. PMLR, K. Chaudhuri and R. Salakhutdinov, Eds., vol. 97, 2019, pp. 2712–2721.
  21. S. Addepalli, S. Jain, G. Sriramanan, and R. Venkatesh Babu, “Scaling adversarial training to large perturbation bounds,” in European Conference on Computer Vision.   Springer, 2022, pp. 301–316.
  22. Y. Carmon, A. Raghunathan, L. Schmidt, J. C. Duchi, and P. Liang, “Unlabeled data improves adversarial robustness,” in NeurIPS, H. M. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alché-Buc, E. B. Fox, and R. Garnett, Eds., 2019, pp. 11 190–11 201.
  23. E. Wong, L. Rice, and J. Z. Kolter, “Fast is better than free: Revisiting adversarial training,” in ICLR, 2020.
  24. M. De Lange, R. Aljundi, M. Masana, S. Parisot, X. Jia, A. Leonardis, G. Slabaugh, and T. Tuytelaars, “A continual learning survey: Defying forgetting in classification tasks,” IEEE transactions on pattern analysis and machine intelligence, vol. 44, no. 7, pp. 3366–3385, 2021.
  25. J. Kirkpatrick, R. Pascanu, N. Rabinowitz, J. Veness, G. Desjardins, A. A. Rusu, K. Milan, J. Quan, T. Ramalho, A. Grabska-Barwinska et al., “Overcoming catastrophic forgetting in neural networks,” National Academy of Sciences, vol. 114, no. 13, pp. 3521–3526, 2017.
  26. M. Toneva, A. Sordoni, R. T. des Combes, A. Trischler, Y. Bengio, and G. J. Gordon, “An empirical study of example forgetting during deep neural network learning,” in ICLR, 2019.
  27. H. Ahn, J. Kwak, S. Lim, H. Bang, H. Kim, and T. Moon, “Ss-il: Separated softmax for incremental learning,” in Proc. IEEE/CVF Int’l Conf. on computer vision, 2021, pp. 844–853.
  28. A. Chaudhry, M. Rohrbach, M. Elhoseiny, T. Ajanthan, P. Dokania, P. Torr, and M. Ranzato, “Continual learning with tiny episodic memories,” in Workshop on Multi-Task and Lifelong Reinforcement Learning, 2019.
  29. H. Ahn, S. Cha, D. Lee, and T. Moon, “Uncertainty-based continual learning with adaptive regularization,” in NeurIPS, 2019, pp. 4394–4404.
  30. S. Wang, X. Li, J. Sun, and Z. Xu, “Training networks in null space of feature covariance for continual learning,” in Proc. IEEE/CVF Conf. on Computer Vision and Pattern Recognition, 2021, pp. 184–193.
  31. A. Mallya and S. Lazebnik, “Packnet: Adding multiple tasks to a single network by iterative pruning,” in Proc. IEEE Conf. on Computer Vision and Pattern Recognition, 2018, pp. 7765–7773.
  32. J. Serra, D. Suris, M. Miron, and A. Karatzoglou, “Overcoming catastrophic forgetting with hard attention to the task,” in ICML.   PMLR, 2018, pp. 4548–4557.
  33. Y. Zhao, Y. Shen, Y. Xiong, S. Yang, W. Xia, Z. Tu, B. Shiele, and S. Soatto, “Elodi: Ensemble logit difference inhibition for positive-congruent training,” arXiv preprint arXiv:2205.06265, 2022.
  34. F. Träuble, J. von Kügelgen, M. Kleindessner, F. Locatello, B. Schölkopf, and P. V. Gehler, “Backward-compatible prediction updates: A probabilistic approach,” in NeurIPS, 2021, pp. 116–128.
  35. Y. Shen, Y. Xiong, W. Xia, and S. Soatto, “Towards backward-compatible representation learning,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 6368–6377.
  36. M. Srivastava, B. Nushi, E. Kamar, S. Shah, and E. Horvitz, “An empirical analysis of backward compatibility in machine learning systems,” in Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2020, pp. 3272–3280.
Citations (1)

Summary

  • The paper introduces RCAT to prevent regression in adversarial robustness by minimizing negative flips during model updates.
  • It extends adversarial training with a non-regression penalty to ensure updated models retain prior security performance.
  • Empirical evaluations on image classification models show RCAT’s superior balance between improving accuracy and maintaining robustness.

Evaluating and Mitigating Regression in Secure Machine Learning Model Updates

Introduction

The recent advancements in ML have necessitated the frequent updating of ML models to leverage novel architectures and additional data for improved performance. Particularly in applications like cybersecurity and image tagging, where data distributions and threat landscapes evolve rapidly, model updates are crucial for maintaining high detection accuracies. However, this practice of model updating introduces a unique challenge; it can lead to what is termed as "negative flips" (NFs) - instances where the updated model misclassifies samples that were previously correctly classified by the older version, essentially causing a regression in model performance as perceived by end-users. Building on the concept of NFs, this work sheds light on another critical dimension of regression concerning adversarial robustness, introducing "robustness negative flips" (RNFs). RNFs occur when adversarial examples, previously ineffective against the old model, successfully deceive the updated model, thereby regressing its perceived security.

Regression in Machine Learning Models

While the phenomenon of accuracy regression, quantified through NFs, has been acknowledged and addressed in the literature, the regression of adversarial robustness or RNFs has not been thoroughly investigated. Adversarial robustness, the model's resilience against maliciously crafted inputs, is a vital aspect of secure ML applications. The work identifies that similar to NFs, updating models can also enhance vulnerability to adversarial attacks, a situation undesirably analogous to software updates that introduce new bugs while fixing old ones.

Robustness-Congruent Adversarial Training (RCAT)

To tackle this dual-regression problem, the authors propose a novel methodology named Robustness-Congruent Adversarial Training (RCAT). At its core, RCAT is an extension of adversarial training that not only seeks to improve the model's robustness but does so with a constraint: minimizing RNFs alongside NFs. By incorporating an additional non-regression penalty term into the optimization problem, RCAT ensures that the updated model does not compromise on adversarial examples previously handled by its predecessor. The authors theoretically establish that this learning framework with non-regression constraints yields a statistically consistent estimator, a crucial attribute for ensuring that the updated model learns the underlying true distribution without compromising convergence rates.

Empirical Evaluation on Image Classification Models

The empirical analysis focuses on robust models designed for image classification tasks, where frequent updates are common to maintain system performance. Through exhaustive experiments involving various model updates, the authors demonstrate the existence and impact of RNFs in practice. They compare RCAT with existing methodologies like Positive-Congruent Training (PCT) and its robust extension (PCAT), showcasing RCAT's superior ability to balance the trade-off between minimizing NFs and RNFs. This balance is critical for updates where an improvement in average accuracy or robustness does not guarantee enhanced security against adversarial attacks.

Implications and Future Directions

This work opens up a new avenue in the research of secure ML model updates by highlighting the overlooked aspect of robustness regression. The findings emphasize the need for a holistic evaluation of model updates, considering both accuracy and adversarial robustness regressions. As future work, the authors suggest exploring different loss functions and regularizers to further mitigate regressions and extending the RCAT methodology to other domains where model updates are frequent and critical for performance maintenance.

In conclusion, the paper underscores the importance of cautious and informed model updating processes in ML applications, especially those requiring high levels of security against adversarial threats. By introducing RCAT, the authors provide a pioneering solution for reducing regressions, paving the way for more secure and reliable ML model updates.

X Twitter Logo Streamline Icon: https://streamlinehq.com