Impact of Dataset Properties on Membership Inference Vulnerability of Deep Transfer Learning (2402.06674v3)
Abstract: We analyse the relationship between privacy vulnerability and dataset properties, such as examples per class and number of classes, when applying two state-of-the-art membership inference attacks (MIAs) to fine-tuned neural networks. We derive per-example MIA vulnerability in terms of score distributions and statistics computed from shadow models. We introduce a simplified model of membership inference and prove that in this model, the logarithm of the difference of true and false positive rates depends linearly on the logarithm of the number of examples per class. We complement the theoretical analysis with empirical analysis by systematically testing the practical privacy vulnerability of fine-tuning large image classification models and obtain the previously derived power law dependence between the number of examples per class in the data and the MIA vulnerability, as measured by true positive rate of the attack at a low false positive rate. Finally, we fit a parametric model of the previously derived form to predict true positive rate based on dataset properties and observe good fit for MIA vulnerability on unseen fine-tuning scenarios.
- Deep learning with differential privacy. In Edgar R. Weippl, Stefan Katzenbeisser, Christopher Kruegel, Andrew C. Myers, and Shai Halevi (eds.), Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, Vienna, Austria, October 24-28, 2016, pp. 308–318. ACM, 2016. doi: 10.1145/2976749.2978318.
- Optuna: A next-generation hyperparameter optimization framework. In Ankur Teredesai, Vipin Kumar, Ying Li, Rómer Rosales, Evimaria Terzi, and George Karypis (eds.), Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD 2019, Anchorage, AK, USA, August 4-8, 2019, pp. 2623–2631. ACM, 2019. doi: 10.1145/3292500.3330701.
- Reconstructing training data with informed adversaries. In 43rd IEEE Symposium on Security and Privacy, SP 2022, San Francisco, CA, USA, May 22-26, 2022, pp. 1138–1156. IEEE, 2022. doi: 10.1109/SP46214.2022.9833677.
- Algorithms for hyper-parameter optimization. In John Shawe-Taylor, Richard S. Zemel, Peter L. Bartlett, Fernando C. N. Pereira, and Kilian Q. Weinberger (eds.), Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011. Proceedings of a meeting held 12-14 December 2011, Granada, Spain, pp. 2546–2554, 2011.
- Membership inference attacks from first principles. In 43rd IEEE Symposium on Security and Privacy, SP 2022, San Francisco, CA, USA, May 22-26, 2022, pp. 1897–1914. IEEE, 2022. doi: 10.1109/SP46214.2022.9833649.
- Fine-tuning with differential privacy necessitates an additional hyperparameter search. CoRR, abs/2210.02156, 2022. doi: 10.48550/arXiv.2210.02156.
- GAN-leaks: A taxonomy of membership inference attacks against generative models. In Jay Ligatti, Xinming Ou, Jonathan Katz, and Giovanni Vigna (eds.), CCS ’20: 2020 ACM SIGSAC Conference on Computer and Communications Security, Virtual Event, USA, November 9-13, 2020, pp. 343–362. ACM, 2020. doi: 10.1145/3372297.3417238.
- Remote sensing image scene classification: Benchmark and state of the art. Proceedings of the IEEE, 105(10):1865–1883, 2017.
- The use of confidence or fiducial limits illustrated in the case of the binomial. Biometrika, 26(4):404–413, December 1934. doi: 10.1093/biomet/26.4.404.
- Unlocking high-accuracy differentially private image classification through scale. CoRR, abs/2204.13650, 2022. doi: 10.48550/arXiv.2204.13650.
- An image is worth 16x16 words: Transformers for image recognition at scale. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net, 2021.
- The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science, 9(3–4):211–407, 2014. ISSN 1551-305X. doi: 10.1561/0400000042.
- Our data, ourselves: Privacy via distributed noise generation. In Serge Vaudenay (ed.), Advances in Cryptology - EUROCRYPT 2006, 25th Annual International Conference on the Theory and Applications of Cryptographic Techniques, St. Petersburg, Russia, May 28 - June 1, 2006, Proceedings, volume 4004 of Lecture Notes in Computer Science, pp. 486–503. Springer, 2006a. doi: 10.1007/11761679˙29.
- Calibrating noise to sensitivity in private data analysis. In Shai Halevi and Tal Rabin (eds.), Theory of Cryptography, Third Theory of Cryptography Conference, TCC 2006, New York, NY, USA, March 4-7, 2006, Proceedings, volume 3876 of Lecture Notes in Computer Science, pp. 265–284. Springer, 2006b. doi: 10.1007/11681878˙14.
- Individual privacy accounting via a Rényi filter. In Marc’Aurelio Ranzato, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, and Jennifer Wortman Vaughan (eds.), Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, pp. 28080–28091, 2021.
- Numerical composition of differential privacy. In Marc’Aurelio Ranzato, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, and Jennifer Wortman Vaughan (eds.), Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, pp. 11631–11642, 2021.
- Eurosat: A novel dataset and deep learning benchmark for land use and land cover classification. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 12(7):2217–2226, 2019.
- Membership inference attacks on machine learning: A survey. ACM Computing Surveys (CSUR), 54(11s):1–37, 2022.
- The composition theorem for differential privacy. In Francis R. Bach and David M. Blei (eds.), Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015, volume 37 of JMLR Workshop and Conference Proceedings, pp. 1376–1385. JMLR.org, 2015.
- Big transfer (BiT): General visual representation learning. In Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm (eds.), Computer Vision - ECCV 2020 - 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part V, volume 12350 of Lecture Notes in Computer Science, pp. 491–507. Springer, 2020. doi: 10.1007/978-3-030-58558-7˙29.
- Computing tight differential privacy guarantees using FFT. In The 23rd International Conference on Artificial Intelligence and Statistics, (AISTATS 2020), volume 108 of Proceedings of Machine Learning Research, pp. 2560–2569. PMLR, 2020.
- Tight differential privacy for discrete-valued mechanisms and for the subsampled Gaussian mechanism using FFT. In The 24th International Conference on Artificial Intelligence and Statistics, (AISTATS 2021), volume 130 of Proceedings of Machine Learning Research, pp. 3358–3366. PMLR, 2021.
- Individual privacy accounting with Gaussian differential privacy. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net, 2023.
- Alex Krizhevsky. Learning multiple layers of features from tiny images. Master’s thesis, University of Toronto, 2009.
- Toward Training at ImageNet Scale with Differential Privacy. ArXiv preprint, abs/2201.12328, 2022.
- Large language models can be strong differentially private learners. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net, 2022.
- Large scale transfer learning for differentially private image classification. CoRR, abs/2205.02973, 2022. doi: 10.48550/arXiv.2205.02973.
- Scalable extraction of training data from (production) language models. CoRR, abs/2311.17035, 2023. doi: 10.48550/ARXIV.2311.17035.
- Addressing membership inference attack in federated learning with model compression. CoRR, abs/2311.17750, 2023. doi: 10.48550/ARXIV.2311.17750.
- Cats and dogs. In 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, June 16-21, 2012, pp. 3498–3505. IEEE Computer Society, 2012. doi: 10.1109/CVPR.2012.6248092.
- Film: Visual reasoning with a general conditioning layer. In Sheila A. McIlraith and Kilian Q. Weinberger (eds.), Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2-7, 2018, pp. 3942–3951. AAAI Press, 2018.
- A differentially private stochastic gradient descent algorithm for multiparty classification. In Neil D. Lawrence and Mark A. Girolami (eds.), Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics, AISTATS 2012, La Palma, Canary Islands, Spain, April 21-23, 2012, volume 22 of JMLR Proceedings, pp. 933–941. JMLR.org, 2012.
- ImageNet large scale visual recognition challenge. Int. J. Comput. Vis., 115(3):211–252, 2015. doi: 10.1007/S11263-015-0816-Y.
- statsmodels: Econometric and statistical modeling with python. In 9th Python in Science Conference, 2010.
- Membership inference attacks against machine learning models. In 2017 IEEE Symposium on Security and Privacy, SP 2017, San Jose, CA, USA, May 22-26, 2017, pp. 3–18. IEEE Computer Society, 2017. doi: 10.1109/SP.2017.41.
- FiT: parameter efficient few-shot transfer learning for personalized and federated image classification. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net, 2023.
- Stochastic gradient descent with differentially private updates. In IEEE Global Conference on Signal and Information Processing, GlobalSIP 2013, Austin, TX, USA, December 3-5, 2013, pp. 245–248. IEEE, 2013. doi: 10.1109/GlobalSIP.2013.6736861.
- On the efficacy of differentially private few-shot image classification. Transactions on Machine Learning Research, 2023. ISSN 2835-8856.
- Considerations for differentially private learning with large-scale public pretraining. CoRR, abs/2212.06470, 2022. doi: 10.48550/arXiv.2212.06470.
- Rotation equivariant cnns for digital pathology. In International Conference on Medical image computing and computer-assisted intervention, pp. 210–218. Springer, 2018.
- Opacus: User-friendly differential privacy library in PyTorch. ArXiv preprint, abs/2109.12298, 2021.
- Differentially private fine-tuning of language models. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net, 2022.
- Individual privacy accounting for differentially private stochastic gradient descent. Transactions on Machine Learning Research, 2023. ISSN 2835-8856.
- A large-scale study of representation learning with the visual task adaptation benchmark. ArXiv preprint, abs/1910.04867, 2019.
- Marlon Tobaben (10 papers)
- Gauri Pradhan (7 papers)
- Yuan He (156 papers)
- Joonas Jälkö (19 papers)
- Antti Honkela (52 papers)
- Hibiki Ito (1 paper)