The Double-Edged Sword of Input Perturbations to Robust Accurate Fairness (2404.01356v1)
Abstract: Deep neural networks (DNNs) are known to be sensitive to adversarial input perturbations, leading to a reduction in either prediction accuracy or individual fairness. To jointly characterize the susceptibility of prediction accuracy and individual fairness to adversarial perturbations, we introduce a novel robustness definition termed robust accurate fairness. Informally, robust accurate fairness requires that predictions for an instance and its similar counterparts consistently align with the ground truth when subjected to input perturbations. We propose an adversarial attack approach dubbed RAFair to expose false or biased adversarial defects in DNN, which either deceive accuracy or compromise individual fairness. Then, we show that such adversarial instances can be effectively addressed by carefully designed benign perturbations, correcting their predictions to be accurate and fair. Our work explores the double-edged sword of input perturbations to robust accurate fairness in DNN and the potential of using benign perturbations to correct adversarial instances.
- Adult. UCI Machine Learning Repository, 1996. DOI: 10.24432/C5XW20.
- Machine bias. Ethics of Data and Analytics: Concepts and Cases, pp. 254, 2022.
- A survey on efficient methods for adversarial robustness. IEEE Access, 10:118815–118830, 2022. doi: 10.1109/ACCESS.2022.3216291. URL https://doi.org/10.1109/ACCESS.2022.3216291.
- Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event, volume 119 of Proceedings of Machine Learning Research, pp. 2206–2216. PMLR, 2020. URL http://proceedings.mlr.press/v119/croce20b.html.
- Fairness through awareness. In Innovations in Theoretical Computer Science 2012, pp. 214–226, 2012.
- Explaining and harnessing adversarial examples. In Bengio, Y. and LeCun, Y. (eds.), 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015. URL http://arxiv.org/abs/1412.6572.
- Hofmann, H. Statlog (German Credit Data). UCI Machine Learning Repository, 1994. DOI: 10.24432/C5NC77.
- Quantile regression. Journal of economic perspectives, 15(4):143–156, 2001.
- Accurate fairness: Improving individual fairness without trading accuracy. In Williams, B., Chen, Y., and Neville, J. (eds.), Thirty-Seventh AAAI Conference on Artificial Intelligence, AAAI 2023, Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence, IAAI 2023, Thirteenth Symposium on Educational Advances in Artificial Intelligence, EAAI 2023, Washington, DC, USA, February 7-14, 2023, pp. 14312–14320. AAAI Press, 2023. doi: 10.1609/aaai.v37i12.26674. URL https://doi.org/10.1609/aaai.v37i12.26674.
- Lloyd, S. P. Least squares quantization in PCM. IEEE Trans. Inf. Theory, 28(2):129–136, 1982. doi: 10.1109/TIT.1982.1056489. URL https://doi.org/10.1109/TIT.1982.1056489.
- Towards deep learning models resistant to adversarial attacks. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, 2018. URL https://openreview.net/forum?id=rJzIBfZAb.
- Information-theoretic testing and debugging of fairness defects in deep neural networks. In 45th IEEE/ACM International Conference on Software Engineering, ICSE 2023, Melbourne, Australia, May 14-20, 2023, pp. 1571–1582. IEEE, 2023. doi: 10.1109/ICSE48619.2023.00136. URL https://doi.org/10.1109/ICSE48619.2023.00136.
- Bank Marketing. UCI Machine Learning Repository, 2012.
- Fairness through robustness: Investigating robustness disparity in deep learning. In Elish, M. C., Isaac, W., and Zemel, R. S. (eds.), FAccT ’21: 2021 ACM Conference on Fairness, Accountability, and Transparency, Virtual Event / Toronto, Canada, March 3-10, 2021, pp. 466–477. ACM, 2021. doi: 10.1145/3442188.3445910. URL https://doi.org/10.1145/3442188.3445910.
- Adversarial input detection based on critical transformation robustness. In IEEE 33rd International Symposium on Software Reliability Engineering, ISSRE 2022, Charlotte, NC, USA, October 31 - Nov. 3, 2022, pp. 390–401. IEEE, 2022. doi: 10.1109/ISSRE55969.2022.00045. URL https://doi.org/10.1109/ISSRE55969.2022.00045.
- Diversified adversarial attacks based on conjugate gradient method. In Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., and Sabato, S. (eds.), International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA, volume 162 of Proceedings of Machine Learning Research, pp. 24872–24894. PMLR, 2022. URL https://proceedings.mlr.press/v162/yamamura22a.html.
- Fairness-aware training of face attribute classifiers via adversarial robustness. Knowl. Based Syst., 264:110356, 2023. doi: 10.1016/J.KNOSYS.2023.110356. URL https://doi.org/10.1016/j.knosys.2023.110356.
- Machine learning testing: Survey, landscapes and horizons. IEEE Trans. Software Eng., 48(2):1–36, 2022a. doi: 10.1109/TSE.2019.2962027. URL https://doi.org/10.1109/TSE.2019.2962027.
- Efficient white-box fairness testing through gradient search. In Cadar, C. and Zhang, X. (eds.), ISSTA ’21: 30th ACM SIGSOFT International Symposium on Software Testing and Analysis, Virtual Event, Denmark, July 11-17, 2021, pp. 103–114. ACM, 2021. doi: 10.1145/3460319.3464820. URL https://doi.org/10.1145/3460319.3464820.
- White-box fairness testing through adversarial sampling. In Rothermel, G. and Bae, D. (eds.), ICSE ’20: 42nd International Conference on Software Engineering, Seoul, South Korea, 27 June - 19 July, 2020, pp. 949–960. ACM, 2020. doi: 10.1145/3377811.3380331. URL https://doi.org/10.1145/3377811.3380331.
- Automatic fairness testing of neural classifiers through adversarial sampling. IEEE Trans. Software Eng., 48(9):3593–3612, 2022b. doi: 10.1109/TSE.2021.3101478. URL https://doi.org/10.1109/TSE.2021.3101478.