Differentially Private Representation Learning via Image Captioning (2403.02506v2)
Abstract: Differentially private (DP) machine learning is considered the gold-standard solution for training a model from sensitive data while still preserving privacy. However, a major barrier to achieving this ideal is its sub-optimal privacy-accuracy trade-off, which is particularly visible in DP representation learning. Specifically, it has been shown that under modest privacy budgets, most models learn representations that are not significantly better than hand-crafted features. In this work, we show that effective DP representation learning can be done via image captioning and scaling up to internet-scale multimodal datasets. Through a series of engineering tricks, we successfully train a DP image captioner (DP-Cap) on a 233M subset of LAION-2B from scratch using a reasonable amount of computation, and obtaining unprecedented high-quality image features that can be used in a variety of downstream vision and vision-language tasks. For example, under a privacy budget of $\varepsilon=8$ for the LAION dataset, a linear classifier trained on top of learned DP-Cap features attains $65.8\%$ accuracy on ImageNet-1K, considerably improving the previous SOTA of $56.5\%$.
- Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC conference on computer and communications security, pages 308–318, 2016.
- Semdedup: Data-efficient learning at web-scale through semantic deduplication. arXiv preprint arXiv:2303.09540, 2023.
- Masked siamese networks for label-efficient learning. In European Conference on Computer Vision, pages 456–473. Springer, 2022.
- Hypothesis testing interpretations and Rényi differential privacy. In International Conference on Artificial Intelligence and Statistics, pages 2496–2506. PMLR, 2020.
- Reconstructing training data with informed adversaries. In 2022 IEEE Symposium on Security and Privacy (SP), pages 1138–1156. IEEE, 2022.
- Procedural image programs for representation learning. Advances in Neural Information Processing Systems, 35:6450–6462, 2022.
- Augmenting clip with improved visio-linguistic reasoning. arXiv preprint arXiv:2307.09233, 2023.
- Unlocking accuracy and fairness in differentially private image classification. arXiv preprint arXiv:2308.10888, 2023.
- On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258, 2021.
- Language models are few-shot learners. arXiv preprint arXiv:2005.14165, 2020.
- Concentrated differential privacy: Simplifications, extensions, and lower bounds. In Theory of Cryptography Conference, pages 635–658. Springer, 2016.
- Membership inference attacks from first principles. arXiv preprint arXiv:2112.03570, 2021.
- Conceptual 12m: Pushing web-scale image-text pre-training to recognize long-tail visual concepts. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3558–3568, 2021.
- A simple framework for contrastive learning of visual representations. CoRR, abs/2002.05709, 2020. https://arxiv.org/abs/2002.05709.
- Unlocking high-accuracy differentially private image classification through scale, 2022. https://arxiv.org/abs/2204.13650.
- Imagenet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 248–255, 2009a. 10.1109/CVPR.2009.5206848.
- Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009b.
- Virtex: Learning visual representations from textual annotations. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11162–11173, 2021.
- Concentrated differential privacy. arXiv preprint arXiv:1603.01887, 2016.
- Calibrating noise to sensitivity in private data analysis. In Proceedings of the Third Conference on Theory of Cryptography, page 265–284, 2006.
- Datacomp: In search of the next generation of multimodal datasets. arXiv preprint arXiv:2304.14108, 2023.
- Numerical composition of differential privacy. Advances in Neural Information Processing Systems, 34:11631–11642, 2021.
- Bounding training data reconstruction in private (deep) learning. In International Conference on Machine Learning, pages 8056–8071. PMLR, 2022.
- Analyzing privacy leakage in machine learning via multiple hypothesis testing: A lesson from fano. In International Conference on Machine Learning, pages 11998–12011. PMLR, 2023.
- Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16000–16009, 2022.
- Evaluating differentially private machine learning in practice. In 28th USENIX Security Symposium (USENIX Security 19), pages 1895–1912, 2019.
- Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- Learning multiple layers of features from tiny images. 2009.
- Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 25, 2012.
- Toward training at imagenet scale with differential privacy. arXiv preprint arXiv:2201.12328, 2022.
- Scaling up differentially private deep learning with fast per-example gradient clipping. arXiv preprint arXiv:2009.03106, 2020.
- Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation. In International Conference on Machine Learning, pages 12888–12900. PMLR, 2022.
- Large language models can be strong differentially private learners, 2021.
- Scaling language-image pre-training via masking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 23390–23400, 2023.
- Microsoft coco: Common objects in context, 2015.
- Private selection from private candidates. In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing (STOC), pages 298–309, 2019.
- Decoupled weight decay regularization. In International Conference on Learning Representations, 2018.
- Julien Mairal. Cyanure: An open-source toolbox for empirical risk minimization for python, c++, and soon more. arXiv preprint arXiv:1912.08165, 2019.
- Learning differentially private recurrent language models. arXiv preprint arXiv:1710.06963, 2017.
- Ilya Mironov. Rényi differential privacy. In 2017 IEEE 30th Computer Security Foundations Symposium (CSF), pages 263–275. IEEE, 2017.
- Rényi differential privacy of the Sampled Gaussian Mechanism. arXiv preprint arXiv:1908.10530, 2019.
- Slip: Self-supervision meets language-image pre-training. In European Conference on Computer Vision, pages 529–544. Springer, 2022.
- Flickr30k entities: Collecting region-to-phrase correspondences for richer image-to-sentence models, 2016.
- Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
- Alfréd Rényi. On measures of entropy and information. In Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Contributions to the Theory of Statistics, volume 4, pages 547–562. University of California Press, 1961.
- Imagenet large scale visual recognition challenge, 2014. https://arxiv.org/abs/1409.0575.
- Tan without a burn: Scaling laws of dp-sgd. In International Conference on Machine Learning, pages 29937–29949. PMLR, 2023.
- Implicit bias in noisy-sgd: With applications to differentially private training, 2024.
- Learning visual representations with caption annotations. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part VIII 16, pages 153–170. Springer, 2020.
- Laion-400m: Open dataset of clip-filtered 400 million image-text pairs. arXiv preprint arXiv:2111.02114, 2021.
- Laion-5b: An open large-scale dataset for training next generation image-text models. ArXiv, abs/2210.08402, 2022. https://api.semanticscholar.org/CorpusID:252917726.
- Membership inference attacks against machine learning models. In 2017 IEEE Symposium on Security and Privacy (S&P), pages 3–18, 2017.
- Stochastic gradient descent with differentially private updates. In IEEE Global Conference on Signal and Information Processing (GlobalSIP), pages 245–248, 2013.
- A fistful of words: Learning transferable visual models from bag-of-words supervision. arXiv preprint arXiv:2112.13884, 2021.
- Deit iii: Revenge of the vit. In European Conference on Computer Vision, pages 516–533. Springer, 2022.
- Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
- Differentially private learning needs better features (or much more data). 2020.
- Image captioners are scalable vision learners too. arXiv preprint arXiv:2306.07915, 2023.
- Benchmarking representation learning for natural world image collections. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12884–12893, 2021.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- Cider: Consensus-based image description evaluation, 2015.
- Subsampled rényi differential privacy and analytical moments accountant. In The 22nd International Conference on Artificial Intelligence and Statistics, pages 1226–1235. PMLR, 2019.
- A study of face obfuscation in imagenet. CoRR, abs/2103.06191, 2021. https://arxiv.org/abs/2103.06191.
- Scaling SGD batch size to 32k for imagenet training. CoRR, abs/1708.03888, 2017. http://arxiv.org/abs/1708.03888.
- Opacus: User-friendly differential privacy library in PyTorch, 2021.
- Differentially private fine-tuning of language models, 2021.
- Coca: Contrastive captioners are image-text foundation models. arXiv preprint arXiv:2205.01917, 2022.
- Vip: A differentially private foundation model for computer vision. arXiv preprint arXiv:2306.08842, 2023.
- When and why vision-language models behave like bags-of-words, and what to do about it? In The Eleventh International Conference on Learning Representations, 2022.
- Learning deep features for scene recognition using places database. Advances in neural information processing systems, 27, 2014.