Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Knowledge Distillation-Based Model Extraction Attack using GAN-based Private Counterfactual Explanations (2404.03348v2)

Published 4 Apr 2024 in cs.LG, cs.AI, cs.CR, and cs.CY

Abstract: In recent years, there has been a notable increase in the deployment of ML models as services (MLaaS) across diverse production software applications. In parallel, explainable AI (XAI) continues to evolve, addressing the necessity for transparency and trustworthiness in ML models. XAI techniques aim to enhance the transparency of ML models by providing insights, in terms of model's explanations, into their decision-making process. Simultaneously, some MLaaS platforms now offer explanations alongside the ML prediction outputs. This setup has elevated concerns regarding vulnerabilities in MLaaS, particularly in relation to privacy leakage attacks such as model extraction attacks (MEA). This is due to the fact that explanations can unveil insights about the inner workings of the model which could be exploited by malicious users. In this work, we focus on investigating how model explanations, particularly counterfactual explanations (CFs), can be exploited for performing MEA within the MLaaS platform. We also delve into assessing the effectiveness of incorporating differential privacy (DP) as a mitigation strategy. To this end, we first propose a novel approach for MEA based on Knowledge Distillation (KD) to enhance the efficiency of extracting a substitute model of a target model exploiting CFs, without any knowledge about the training data distribution by the attacker. Then, we advise an approach for training CF generators incorporating DP to generate private CFs. We conduct thorough experimental evaluations on real-world datasets and demonstrate that our proposed KD-based MEA can yield a high-fidelity substitute model with a reduced number of queries with respect to baseline approaches. Furthermore, our findings reveal that including a privacy layer can allow mitigating the MEA. However, on the account of the quality of CFs, impacts the performance of the explanations.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (55)
  1. Aws explainable ai. https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-model-explainability.html. Accessed: 2024-01-04.
  2. California housing dataset. https://scikit-learn.org/stable/modules/generated/sklearn.datasets.fetch_california_housing.html. Accessed: 2024-01-04.
  3. Credit card fraud dataset. https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud. Accessed: 2024-01-04.
  4. Give me some credit dataset. https://www.kaggle.com/c/GiveMeSomeCredit. Accessed: 2024-01-04.
  5. Give me some credit notebook. https://www.kaggle.com/code/bannourchaker/credit-deep-learning. Accessed: 2024-01-04.
  6. Google auto ml. https://cloud.google.com/automl?hl=en. Accessed: 2024-01-04.
  7. Google explainable ai. https://cloud.google.com/explainable-ai. Accessed: 2024-01-04.
  8. Ibm explainable ai. https://aix360.mybluemix.net/. Accessed: 2024-01-04.
  9. Microsoft azure ai. https://azure.microsoft.com/en-us/products/machine-learning. Accessed: 2024-01-04.
  10. Microsoft explainable ai. https://learn.microsoft.com/en-us/azure/machine-learning/how-to-machine-learning-interpretability?view=azureml-api-2&WT.mc_id=docs-article-lazzeri. Accessed: 2024-01-04.
  11. Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC conference on computer and communications security, pages 308–318, 2016.
  12. Model extraction from counterfactual explanations. arXiv preprint arXiv:2009.01884, 2020.
  13. Relace: reinforcement learning agent for counterfactual explanations of arbitrary predictive models. arXiv preprint arXiv:2110.11960, 2021.
  14. J. H. Cho and B. Hariharan. On the efficacy of knowledge distillation. In Proceedings of the IEEE/CVF international conference on computer vision, pages 4794–4802, 2019.
  15. Explainable artificial intelligence: A survey. In 2018 41st International convention on information and communication technology, electronics and microelectronics (MIPRO), pages 0210–0215. IEEE, 2018.
  16. C. Dwork. Differential privacy. In International colloquium on automata, languages, and programming, pages 1–12. Springer, 2006.
  17. Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography: Third Theory of Cryptography Conference, TCC 2006, New York, NY, USA, March 4-7, 2006. Proceedings 3, pages 265–284. Springer, 2006.
  18. Sac-fact: Soft actor-critic reinforcement learning for counterfactual explanations. In World Conference on Explainable Artificial Intelligence, pages 195–216. Springer, 2023.
  19. M. Hashemi and A. Fathi. Permuteattack: Counterfactual explanation of machine learning credit scorecards. arXiv preprint arXiv:2008.10138, 2020.
  20. Causability and explainability of artificial intelligence in medicine. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 9(4):e1312, 2019.
  21. High accuracy and high fidelity extraction of neural networks. In 29th USENIX security symposium (USENIX Security 20), pages 1345–1362, 2020.
  22. Towards realistic individual recourse and actionable explanations in black-box decision making systems. arXiv preprint arXiv:1907.09615, 2019.
  23. Prada: protecting against dnn model stealing attacks. In 2019 IEEE European Symposium on Security and Privacy (EuroS&P), pages 512–527. IEEE, 2019.
  24. Ordered counterfactual explanation by mixed-integer linear optimization. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 11564–11574, 2021.
  25. Thieves on sesame street! model extraction of bert-based apis. arXiv preprint arXiv:1910.12366, 2019.
  26. Deep learning. nature, 521(7553):436–444, 2015.
  27. D. Martens and F. Provost. Explaining data-driven document classifications. MIS quarterly, 38(1):73–100, 2014.
  28. Model reconstruction from model explanations. In Proceedings of the Conference on Fairness, Accountability, and Transparency, pages 1–9, 2019.
  29. Megex: Data-free model extraction attack against gradient-based explainable ai. arXiv preprint arXiv:2107.08909, 2021.
  30. Explaining deep learning models with constrained adversarial examples. In PRICAI 2019: Trends in Artificial Intelligence: 16th Pacific Rim International Conference on Artificial Intelligence, Cuvu, Yanuca Island, Fiji, August 26–30, 2019, Proceedings, Part I 16, pages 43–56. Springer, 2019.
  31. Explaining machine learning classifiers through diverse counterfactual explanations. In Proceedings of the 2020 conference on fairness, accountability, and transparency, pages 607–617, 2020.
  32. Comprehensive privacy analysis of deep learning: Passive and active white-box inference attacks against centralized and federated learning. In 2019 IEEE symposium on security and privacy (SP), pages 739–753. IEEE, 2019.
  33. Countergan: Generating realistic counterfactuals with residual generative adversarial nets. arXiv preprint arXiv:2009.05199, 2020.
  34. Counterfactual explanation with multi-agent reinforcement learning for drug target prediction. arXiv preprint arXiv:2103.12983, 2021.
  35. D. Numeroso and D. Bacciu. Explaining deep graph networks with molecular counterfactuals. arXiv preprint arXiv:2011.05134, 2020.
  36. Towards reverse-engineering black-box neural networks. Explainable AI: Interpreting, Explaining and Visualizing Deep Learning, pages 121–144, 2019.
  37. Autolycus: Exploiting explainable ai (xai) for model extraction attacks against decision tree models. arXiv preprint arXiv:2302.02162, 2023.
  38. Knockoff nets: Stealing functionality of black-box models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4954–4963, 2019.
  39. Activethief: Model extraction using active learning and unannotated public data. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 865–872, 2020.
  40. Model explanations with differential privacy. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency, pages 1895–1904, 2022.
  41. Privacy-preserving algorithmic recourse. arXiv preprint arXiv:2311.14137, 2023.
  42. Face: feasible and actionable counterfactual explanations. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, pages 344–350, 2020.
  43. " why should i trust you?" explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016.
  44. Certifai: Counterfactual explanations for robustness, transparency, interpretability, and fairness of artificial intelligence models. arXiv preprint arXiv:1905.07857, 2019.
  45. P. P. Shinde and S. Shah. A review of machine learning and deep learning applications. In 2018 Fourth international conference on computing communication control and automation (ICCUBEA), pages 1–6. IEEE, 2018.
  46. On the privacy risks of model explanations. In Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society, pages 231–241, 2021.
  47. Membership inference attacks against machine learning models. In 2017 IEEE symposium on security and privacy (SP), pages 3–18. IEEE, 2017.
  48. K. Sokol and P. Flach. Counterfactual explanations of machine learning predictions: Opportunities and challenges for ai safety. In 2019 AAAI Workshop on Artificial Intelligence Safety, SafeAI 2019. CEUR Workshop Proceedings, 2019.
  49. Stealing machine learning models via prediction {{\{{APIs}}\}}. In 25th USENIX security symposium (USENIX Security 16), pages 601–618, 2016.
  50. Counterfactual explanations without opening the black box: Automated decisions and the gdpr. Harv. JL & Tech., 31:841, 2017.
  51. Variational model inversion attacks. Advances in Neural Information Processing Systems, 34:9706–9719, 2021.
  52. Dualcf: Efficient model extraction attack from counterfactual explanations. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency, pages 1318–1329, 2022.
  53. Towards explainable model extraction attacks. International Journal of Intelligent Systems, 37(11):9936–9956, 2022.
  54. Differentially private counterfactuals via functional mechanism. arXiv preprint arXiv:2208.02878, 2022.
  55. Functional mechanism: Regression analysis under differential privacy. arXiv preprint arXiv:1208.0219, 2012.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Fatima Ezzeddine (5 papers)
  2. Omran Ayoub (8 papers)
  3. Silvia Giordano (24 papers)

Summary

We haven't generated a summary for this paper yet.