Adversarial Attacks Neutralization via Data Set Randomization (2306.12161v1)
Abstract: Adversarial attacks on deep-learning models pose a serious threat to their reliability and security. Existing defense mechanisms are narrow addressing a specific type of attack or being vulnerable to sophisticated attacks. We propose a new defense mechanism that, while being focused on image-based classifiers, is general with respect to the cited category. It is rooted on hyperspace projection. In particular, our solution provides a pseudo-random projection of the original dataset into a new dataset. The proposed defense mechanism creates a set of diverse projected datasets, where each projected dataset is used to train a specific classifier, resulting in different trained classifiers with different decision boundaries. During testing, it randomly selects a classifier to test the input. Our approach does not sacrifice accuracy over legitimate input. Other than detailing and providing a thorough characterization of our defense mechanism, we also provide a proof of concept of using four optimization-based adversarial attacks (PGD, FGSM, IGSM, and C&W) and a generative adversarial attack testing them on the MNIST dataset. Our experimental results show that our solution increases the robustness of deep learning models against adversarial attacks and significantly reduces the attack success rate by at least 89% for optimization attacks and 78% for generative attacks. We also analyze the relationship between the number of used hyperspaces and the efficacy of the defense mechanism. As expected, the two are positively correlated, offering an easy-to-tune parameter to enforce the desired level of security. The generality and scalability of our solution and adaptability to different attack scenarios, combined with the excellent achieved results, other than providing a robust defense against adversarial attacks on deep learning networks, also lay the groundwork for future research in the field.
- Advances in adversarial attacks and defenses in computer vision: A survey. IEEE Access, 9:155161–155196, 2021.
- Evaluating adversarial robustness of secret key-based defenses. IEEE Access, 10:34872–34882, 2022.
- Block-wise image transformation with secret key for adversarially robust defense. IEEE Transactions on Information Forensics and Security, 16:2709–2723, 2021.
- Adversarial examples are not easily detected: Bypassing ten detection methods. In ACM workshop on artificial intelligence and security, pages 3–14, 2017a.
- Towards evaluating the robustness of neural networks. In 2017 IEEE S&P, pages 39–57. Ieee, 2017b.
- Audio adversarial examples: Targeted attacks on speech-to-text. In 2018 IEEE security and privacy workshops (SPW), pages 1–7. IEEE, 2018.
- Detecting adversarial samples from artifacts. arXiv preprint arXiv:1703.00410, 2017.
- Adversarial and clean data are not twins. arXiv preprint arXiv:1704.04960, 2017.
- Generative adversarial networks. Communications of the ACM, 63(11):139–144, 2020.
- Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014.
- Countering adversarial images using input transformations. arXiv preprint arXiv:1711.00117, 2017.
- Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015.
- Adversarial attacks for image segmentation on multiple lightweight models. IEEE Access, 8:31359–31370, 2020.
- Adversarial examples in the physical world. In Artificial intelligence safety and security, pages 99–112. Chapman and Hall/CRC, 2018.
- Adversarial examples detection in deep networks with convolutional filter statistics. In Proceedings of the IEEE international conference on computer vision, pages 5764–5772, 2017.
- Towards robust neural networks via random self-ensemble. In Proceedings of the European Conference on Computer Vision (ECCV), pages 369–385, 2018a.
- Adv-bnn: Improved adversarial defense through robust bayesian neural network. arXiv preprint arXiv:1810.01279, 2018b.
- Safetynet: Detecting and rejecting adversarial examples robustly. In Proceedings of the IEEE international conference on computer vision, pages 446–454, 2017.
- Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083, 2017.
- Magnet: a two-pronged defense against adversarial examples. In ACM SIGSAC CCS, pages 135–147, 2017.
- On detecting adversarial perturbations. arXiv preprint arXiv:1702.04267, 2017.
- Resisting deep learning models against adversarial attack transferability via feature randomization. arXiv preprint arXiv:2209.04930, 2022.
- Distillation as a defense to adversarial perturbations against deep neural networks. In 2016 IEEE S&P, pages 582–597. IEEE, 2016.
- Barrage of random transforms for adversarially robust defense. In Proceedings of the IEEE/CVF CVPR, pages 6528–6537, 2019.
- Understanding adversarial training: Increasing local stability of neural nets through robust optimization. arXiv preprint arXiv:1511.05432, 2015.
- Defending against adversarial attacks by randomized diversification. In Proceedings of the IEEE/CVF CVPR, pages 11226–11233, 2019.
- A brief review of machine learning and its application. In 2009 international conference on information engineering and computer science, pages 1–4. IEEE, 2009.
- Mitigating adversarial effects through randomization. arXiv preprint arXiv:1711.01991, 2017.
- Lecun Yann. The mnist database of handwritten digits. R, 1998.
- Adversarial noise layer: Regularize neural network by adding noise. In 2019 IEEE International Conference on Image Processing (ICIP), pages 909–913. IEEE, 2019.
- Understanding robust overfitting of adversarial training and beyond. In International Conference on Machine Learning, pages 25595–25610. PMLR, 2022.