Knowledge Distillation-Based Model Extraction Attack using GAN-based Private Counterfactual Explanations (2404.03348v2)

Published 4 Apr 2024 in cs.LG, cs.AI, cs.CR, and cs.CY

Abstract: In recent years, there has been a notable increase in the deployment of ML models as services (MLaaS) across diverse production software applications. In parallel, explainable AI (XAI) continues to evolve, addressing the necessity for transparency and trustworthiness in ML models. XAI techniques aim to enhance the transparency of ML models by providing insights, in terms of model's explanations, into their decision-making process. Simultaneously, some MLaaS platforms now offer explanations alongside the ML prediction outputs. This setup has elevated concerns regarding vulnerabilities in MLaaS, particularly in relation to privacy leakage attacks such as model extraction attacks (MEA). This is due to the fact that explanations can unveil insights about the inner workings of the model which could be exploited by malicious users. In this work, we focus on investigating how model explanations, particularly counterfactual explanations (CFs), can be exploited for performing MEA within the MLaaS platform. We also delve into assessing the effectiveness of incorporating differential privacy (DP) as a mitigation strategy. To this end, we first propose a novel approach for MEA based on Knowledge Distillation (KD) to enhance the efficiency of extracting a substitute model of a target model exploiting CFs, without any knowledge about the training data distribution by the attacker. Then, we advise an approach for training CF generators incorporating DP to generate private CFs. We conduct thorough experimental evaluations on real-world datasets and demonstrate that our proposed KD-based MEA can yield a high-fidelity substitute model with a reduced number of queries with respect to baseline approaches. Furthermore, our findings reveal that including a privacy layer can allow mitigating the MEA. However, on the account of the quality of CFs, impacts the performance of the explanations.

References (55)

Authors (3)

Fatima Ezzeddine (5 papers)
Omran Ayoub (8 papers)
Silvia Giordano (24 papers)

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Knowledge Distillation-Based Model Extraction Attack using GAN-based Private Counterfactual Explanations (2404.03348v2)

Summary

Related Papers