Robust Proxy: Improving Adversarial Robustness by Robust Proxy Learning (2306.15457v1)
Abstract: Recently, it has been widely known that deep neural networks are highly vulnerable and easily broken by adversarial attacks. To mitigate the adversarial vulnerability, many defense algorithms have been proposed. Recently, to improve adversarial robustness, many works try to enhance feature representation by imposing more direct supervision on the discriminative feature. However, existing approaches lack an understanding of learning adversarially robust feature representation. In this paper, we propose a novel training framework called Robust Proxy Learning. In the proposed method, the model explicitly learns robust feature representations with robust proxies. To this end, firstly, we demonstrate that we can generate class-representative robust features by adding class-wise robust perturbations. Then, we use the class representative features as robust proxies. With the class-wise robust features, the model explicitly learns adversarially robust features through the proposed robust proxy learning framework. Through extensive experiments, we verify that we can manually generate robust features, and our proposed learning framework could increase the robustness of the DNNs.
- A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” Advances in neural information processing systems, vol. 25, 2012.
- A. Hannun, C. Case, J. Casper, B. Catanzaro, G. Diamos, E. Elsen, R. Prenger, S. Satheesh, S. Sengupta, A. Coates et al., “Deep speech: Scaling up end-to-end speech recognition,” arXiv preprint arXiv:1412.5567, 2014.
- T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, “Distributed representations of words and phrases and their compositionality,” Advances in neural information processing systems, vol. 26, 2013.
- I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing adversarial examples,” arXiv preprint arXiv:1412.6572, 2014.
- A. Nguyen, J. Yosinski, and J. Clune, “Deep neural networks are easily fooled: High confidence predictions for unrecognizable images,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 427–436.
- A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, “Towards deep learning models resistant to adversarial attacks,” arXiv preprint arXiv:1706.06083, 2017.
- C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus, “Intriguing properties of neural networks,” arXiv preprint arXiv:1312.6199, 2013.
- P.-Y. Chen, H. Zhang, Y. Sharma, J. Yi, and C.-J. Hsieh, “Zoo: Zeroth order optimization based black-box attacks to deep neural networks without training substitute models,” in Proceedings of the 10th ACM workshop on artificial intelligence and security, 2017, pp. 15–26.
- N. Carlini and D. Wagner, “Towards evaluating the robustness of neural networks,” in 2017 ieee symposium on security and privacy (sp). IEEE, 2017, pp. 39–57.
- F. Liao, M. Liang, Y. Dong, T. Pang, X. Hu, and J. Zhu, “Defense against adversarial attacks using high-level representation guided denoiser,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 1778–1787.
- G. S. Dhillon, K. Azizzadenesheli, Z. C. Lipton, J. Bernstein, J. Kossaifi, A. Khanna, and A. Anandkumar, “Stochastic activation pruning for robust adversarial defense,” arXiv preprint arXiv:1803.01442, 2018.
- C. Guo, M. Rana, M. Cisse, and L. Van Der Maaten, “Countering adversarial images using input transformations,” arXiv preprint arXiv:1711.00117, 2017.
- T. Pang, K. Xu, C. Du, N. Chen, and J. Zhu, “Improving adversarial robustness via promoting ensemble diversity,” in International Conference on Machine Learning. PMLR, 2019, pp. 4970–4979.
- B.-K. Lee, J. Kim, and Y. M. Ro, “Masking adversarial damage: Finding adversarial saliency for robust and sparse network,” arXiv preprint arXiv:2204.02738, 2022.
- Y. Wang, D. Zou, J. Yi, J. Bailey, X. Ma, and Q. Gu, “Improving adversarial robustness requires revisiting misclassified examples,” in International Conference on Learning Representations, 2019.
- H. Zhang, Y. Yu, J. Jiao, E. Xing, L. El Ghaoui, and M. Jordan, “Theoretically principled trade-off between robustness and accuracy,” in International conference on machine learning. PMLR, 2019, pp. 7472–7482.
- R. Rade and S.-M. Moosavi-Dezfooli, “Reducing excessive margin to achieve a better accuracy vs. robustness trade-off,” in International Conference on Learning Representations, 2022. [Online]. Available: https://openreview.net/forum?id=Azh9QBQ4tR7
- T. Bai, J. Luo, J. Zhao, B. Wen, and Q. Wang, “Recent advances in adversarial training for adversarial robustness,” arXiv preprint arXiv:2102.01356, 2021.
- A. Athalye, N. Carlini, and D. Wagner, “Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples,” in International conference on machine learning. PMLR, 2018, pp. 274–283.
- C. Mao, Z. Zhong, J. Yang, C. Vondrick, and B. Ray, “Metric learning for adversarial robustness,” Advances in Neural Information Processing Systems, vol. 32, 2019.
- L. Fan, S. Liu, P.-Y. Chen, G. Zhang, and C. Gan, “When does contrastive learning preserve adversarial robustness from pretraining to finetuning?” Advances in Neural Information Processing Systems, vol. 34, 2021.
- Y. Zhong and W. Deng, “Adversarial learning with margin-based triplet embedding regularization,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 6549–6558.
- M. Kim, J. Tack, and S. J. Hwang, “Adversarial self-supervised contrastive learning,” Advances in Neural Information Processing Systems, vol. 33, pp. 2983–2994, 2020.
- Z. Jiang, T. Chen, T. Chen, and Z. Wang, “Robust pre-training by adversarial contrastive learning,” in Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, Eds., vol. 33. Curran Associates, Inc., 2020, pp. 16 199–16 210. [Online]. Available: https://proceedings.neurips.cc/paper/2020/file/ba7e36c43aff315c00ec2b8625e3b719-Paper.pdf
- S. Gowal, P.-S. Huang, A. van den Oord, T. Mann, and P. Kohli, “Self-supervised adversarial robustness for the low-label, high-data regime,” in International Conference on Learning Representations, 2021. [Online]. Available: https://openreview.net/forum?id=bgQek2O63w
- A. Ilyas, S. Santurkar, D. Tsipras, L. Engstrom, B. Tran, and A. Madry, “Adversarial examples are not bugs, they are features,” Advances in neural information processing systems, vol. 32, 2019.
- J. Kim, B.-K. Lee, and Y. M. Ro, “Distilling robust and non-robust features in adversarial examples by information bottleneck,” Advances in Neural Information Processing Systems, vol. 34, 2021.
- K. Roth, Y. Kilcher, and T. Hofmann, “The odds are odd: A statistical test for detecting adversarial examples,” in International Conference on Machine Learning. PMLR, 2019, pp. 5498–5507.
- A. Shafahi, W. R. Huang, C. Studer, S. Feizi, and T. Goldstein, “Are adversarial examples inevitable?” in International Conference on Learning Representations, 2019. [Online]. Available: https://openreview.net/forum?id=r1lWUoA9FQ
- D. Tsipras, S. Santurkar, L. Engstrom, A. Turner, and A. Madry, “Robustness may be at odds with accuracy,” arXiv preprint arXiv:1805.12152, 2018.
- S. Yang, T. Guo, Y. Wang, and C. Xu, “Adversarial robustness through disentangled representations,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 4, 2021, pp. 3145–3153.
- J. Kim, B.-K. Lee, and Y. M. Ro, “Demystifying causal features on adversarial examples and causal inoculation for robust network by adversarial instrumental variable regression,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2023, pp. 12 302–12 312.
- J. Zhang, J. Zhu, G. Niu, B. Han, M. Sugiyama, and M. Kankanhalli, “Geometry-aware instance-reweighted adversarial training,” arXiv preprint arXiv:2010.01736, 2020.
- D. Jakubovitz and R. Giryes, “Improving dnn robustness to adversarial attacks using jacobian regularization,” in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 514–529.
- Y. Zhong and W. Deng, “Adversarial learning with margin-based triplet embedding regularization,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2019.
- H. Wang, Y. Deng, S. Yoo, H. Ling, and Y. Lin, “Agkd-bml: Defense against adversarial attack by attention guided knowledge distillation and bi-directional metric learning,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 7658–7667.
- A. A. Alemi, I. Fischer, J. V. Dillon, and K. Murphy, “Deep variational information bottleneck,” in International Conference on Learning Representations, 2017. [Online]. Available: https://openreview.net/forum?id=HyxQzBceg
- A. Krizhevsky, G. Hinton et al., “Learning multiple layers of features from tiny images,” 2009.
- Y. Le and X. Yang, “Tiny imagenet visual recognition challenge,” CS 231N, vol. 7, no. 7, p. 3, 2015.
- K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
- S. Zagoruyko and N. Komodakis, “Wide residual networks,” arXiv preprint arXiv:1605.07146, 2016.
- F. Croce and M. Hein, “Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks,” in International conference on machine learning. PMLR, 2020, pp. 2206–2216.
- ——, “Minimally distorted adversarial examples with a fast adaptive boundary attack,” in International Conference on Machine Learning. PMLR, 2020, pp. 2196–2205.
- M. Andriushchenko, F. Croce, N. Flammarion, and M. Hein, “Square attack: a query-efficient black-box adversarial attack via random search,” in European Conference on Computer Vision. Springer, 2020, pp. 484–501.
- N. Carlini, A. Athalye, N. Papernot, W. Brendel, J. Rauber, D. Tsipras, I. Goodfellow, A. Madry, and A. Kurakin, “On evaluating adversarial robustness,” arXiv preprint arXiv:1902.06705, 2019.
- F. Tramer, N. Carlini, W. Brendel, and A. Madry, “On adaptive attacks to adversarial example defenses,” Advances in neural information processing systems, vol. 33, pp. 1633–1645, 2020.
- F. Tramer, “Detecting adversarial examples is (nearly) as hard as classifying them,” in International Conference on Machine Learning. PMLR, 2022, pp. 21 692–21 702.
- B. C. Kim, J. U. Kim, H. Lee, and Y. M. Ro, “Revisiting role of autoencoders in adversarial settings,” in 2020 IEEE International Conference on Image Processing (ICIP). IEEE, 2020, pp. 1856–1860.
- M. Zhou and V. M. Patel, “Enhancing adversarial robustness for deep metric learning,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 15 325–15 334.
- Hong Joo Lee (9 papers)
- Yong Man Ro (91 papers)