Privacy Backdoors: Enhancing Membership Inference through Poisoning Pre-trained Models (2404.01231v1)
Abstract: It is commonplace to produce application-specific models by fine-tuning large pre-trained models using a small bespoke dataset. The widespread availability of foundation model checkpoints on the web poses considerable risks, including the vulnerability to backdoor attacks. In this paper, we unveil a new vulnerability: the privacy backdoor attack. This black-box privacy attack aims to amplify the privacy leakage that arises when fine-tuning a model: when a victim fine-tunes a backdoored model, their training data will be leaked at a significantly higher rate than if they had fine-tuned a typical model. We conduct extensive experiments on various datasets and models, including both vision-LLMs (CLIP) and LLMs, demonstrating the broad applicability and effectiveness of such an attack. Additionally, we carry out multiple ablation studies with different fine-tuning methods and inference strategies to thoroughly analyze this new threat. Our findings highlight a critical privacy concern within the machine learning community and call for a reevaluation of safety protocols in the use of open-source pre-trained models.
- ai4Privacy. pii-masking-200k (revision 1d4c0a1), 2023. URL https://huggingface.co/datasets/ai4privacy/pii-masking-200k.
- Quantifying membership inference vulnerability via generalization gap and other model metrics. arXiv preprint arXiv:2009.05669, 2020.
- Pythia: A suite for analyzing large language models across training and scaling. In International Conference on Machine Learning, pp. 2397–2430. PMLR, 2023.
- GPT-Neo: Large Scale Autoregressive Language Modeling with Mesh-Tensorflow, March 2021. URL https://doi.org/10.5281/zenodo.5297715.
- When the curious abandon honesty: Federated learning is not private. In 2023 IEEE 8th European Symposium on Security and Privacy (EuroS&P), pp. 175–199, 2023. doi: 10.1109/EuroSP57164.2023.00020.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
- The secret sharer: Evaluating and testing unintended memorization in neural networks. In USENIX Security Symposium, 2018. URL https://api.semanticscholar.org/CorpusID:170076423.
- Extracting training data from large language models. In 30th USENIX Security Symposium (USENIX Security 21), pp. 2633–2650. USENIX Association, August 2021. ISBN 978-1-939133-24-3. URL https://www.usenix.org/conference/usenixsecurity21/presentation/carlini-extracting.
- Membership inference attacks from first principles. In 2022 IEEE Symposium on Security and Privacy (SP), pp. 1897–1914. IEEE, 2022.
- Extracting training data from diffusion models. In 32nd USENIX Security Symposium (USENIX Security 23), pp. 5253–5270, Anaheim, CA, August 2023. USENIX Association. ISBN 978-1-939133-37-3. URL https://www.usenix.org/conference/usenixsecurity23/presentation/carlini.
- Targeted backdoor attacks on deep learning systems using data poisoning. arXiv preprint arXiv:1712.05526, 2017.
- Reproducible scaling laws for contrastive language-image learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2818–2829, 2023.
- Label-only membership inference attacks. In International conference on machine learning, pp. 1964–1974. PMLR, 2021.
- Knowledge neurons in pretrained transformers. In Muresan, S., Nakov, P., and Villavicencio, A. (eds.), Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 8493–8502, Dublin, Ireland, May 2022. Association for Computational Linguistics. doi: 10.18653/v1/2022.acl-long.581. URL https://aclanthology.org/2022.acl-long.581.
- Privacy side channels in machine learning systems. arXiv preprint arXiv:2309.05610, 2023.
- Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pp. 248–255. Ieee, 2009.
- Qlora: Efficient finetuning of quantized llms. ArXiv, abs/2305.14314, 2023. URL https://api.semanticscholar.org/CorpusID:258841328.
- Are diffusion models vulnerable to membership inference attacks? In Krause, A., Brunskill, E., Cho, K., Engelhardt, B., Sabato, S., and Scarlett, J. (eds.), Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pp. 8717–8730. PMLR, 23–29 Jul 2023. URL https://proceedings.mlr.press/v202/duan23b.html.
- Privacy backdoors: Stealing data with corrupted pretrained models. 2024.
- Robbing the fed: Directly obtaining private data in federated learning with modified models. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=fwzUgo0FM9v.
- Decepticons: Corrupted transformers breach privacy in federated learning for language models. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=r0BrY4BiEXO.
- Frey, S. Introducing Android’s Private Compute Services, September 2021. URL https://security.googleblog.com/2021/09/introducing-androids-private-compute.html.
- Inverting gradients - how easy is it to break privacy in federated learning? In Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., and Lin, H. (eds.), Advances in Neural Information Processing Systems, volume 33, pp. 16937–16947. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper_files/paper/2020/file/c4ede56bbd98819ae6112b20ac6bf145-Paper.pdf.
- Badnets: Identifying vulnerabilities in the machine learning model supply chain. arXiv preprint arXiv:1708.06733, 2017.
- Handcrafted backdoors in deep neural networks. In Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., and Oh, A. (eds.), Advances in Neural Information Processing Systems, volume 35, pp. 8068–8080. Curran Associates, Inc., 2022. URL https://proceedings.neurips.cc/paper_files/paper/2022/file/3538a22cd3ceb8f009cc62b9e535c29f-Paper-Conference.pdf.
- Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021.
- Sleeper agents: Training deceptive llms that persist through safety training. arXiv preprint arXiv:2401.05566, 2024.
- Neftune: Noisy embeddings improve instruction finetuning. arXiv preprint arXiv:2310.05914, 2023.
- Mimic-iii, a freely accessible critical care database. Scientific data, 3(1):1–9, 2016.
- Mimic-iv, a freely accessible electronic health record dataset. Scientific data, 10(1):1, 2023.
- A watermark for large language models. In International Conference on Machine Learning, 2023. URL https://api.semanticscholar.org/CorpusID:256194179.
- Learning multiple layers of features from tiny images. Technical report, Citeseer, 2009.
- Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017.
- How hard is trojan detection in DNNs? fooling detectors with evasive trojans, 2023. URL https://openreview.net/forum?id=V-RDBWYf0go.
- Communication-efficient learning of deep networks from decentralized data. In Artificial intelligence and statistics, pp. 1273–1282. PMLR, 2017.
- Regularizing and optimizing lstm language models. ArXiv, abs/1708.02182, 2017. URL https://api.semanticscholar.org/CorpusID:212756.
- Language model inversion. arXiv preprint arXiv:2311.13647, 2023.
- Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
- Learning transferable visual models from natural language supervision. In International conference on machine learning, pp. 8748–8763. PMLR, 2021.
- White-box vs black-box: Bayes optimal strategies for membership inference. In International Conference on Machine Learning, pp. 5558–5567. PMLR, 2019.
- Membership inference attacks against machine learning models. In 2017 IEEE symposium on security and privacy (SP), pp. 3–18. IEEE, 2017.
- The White House. Fact sheet: President biden issues executive order on safe, secure, and trustworthy artificial intelligence, 10 2023. URL https://www.whitehouse.gov/briefing-room/statements-releases/2023/10/30/fact-sheet-president-biden-issues-executive-order-on-safe-secure-and-trustworthy-artificial-intelligence/.
- Truth serum: Poisoning machine learning models to reveal their secrets. Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security, 2022. URL https://api.semanticscholar.org/CorpusID:247922814.
- Optimized glycemic control of type 2 diabetes with reinforcement learning: a proof-of-concept trial. Nature Medicine, 29(10):2633–2642, 2023.
- Fishing for user data in large-batch federated learning via gradient magnification. In Chaudhuri, K., Jegelka, S., Song, L., Szepesvari, C., Niu, G., and Sabato, S. (eds.), Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pp. 23668–23684. PMLR, 17–23 Jul 2022. URL https://proceedings.mlr.press/v162/wen22a.html.
- Canary in a coalmine: Better membership inference with ensembled adversarial queries. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=b7SBTEBFnC.
- Robust fine-tuning of zero-shot models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7959–7971, 2022.
- Privacy risk in machine learning: Analyzing the connection to overfitting. In 2018 IEEE 31st computer security foundations symposium (CSF), pp. 268–282. IEEE, 2018.
- See through gradients: Image batch recovery via gradinversion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 16337–16346, June 2021.
- Opt: Open pre-trained transformer language models. arXiv preprint arXiv:2205.01068, 2022.