Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
162 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Differentially Private Representation Learning via Image Captioning (2403.02506v2)

Published 4 Mar 2024 in cs.CV and cs.LG

Abstract: Differentially private (DP) machine learning is considered the gold-standard solution for training a model from sensitive data while still preserving privacy. However, a major barrier to achieving this ideal is its sub-optimal privacy-accuracy trade-off, which is particularly visible in DP representation learning. Specifically, it has been shown that under modest privacy budgets, most models learn representations that are not significantly better than hand-crafted features. In this work, we show that effective DP representation learning can be done via image captioning and scaling up to internet-scale multimodal datasets. Through a series of engineering tricks, we successfully train a DP image captioner (DP-Cap) on a 233M subset of LAION-2B from scratch using a reasonable amount of computation, and obtaining unprecedented high-quality image features that can be used in a variety of downstream vision and vision-language tasks. For example, under a privacy budget of $\varepsilon=8$ for the LAION dataset, a linear classifier trained on top of learned DP-Cap features attains $65.8\%$ accuracy on ImageNet-1K, considerably improving the previous SOTA of $56.5\%$.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (70)
  1. Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC conference on computer and communications security, pages 308–318, 2016.
  2. Semdedup: Data-efficient learning at web-scale through semantic deduplication. arXiv preprint arXiv:2303.09540, 2023.
  3. Masked siamese networks for label-efficient learning. In European Conference on Computer Vision, pages 456–473. Springer, 2022.
  4. Hypothesis testing interpretations and Rényi differential privacy. In International Conference on Artificial Intelligence and Statistics, pages 2496–2506. PMLR, 2020.
  5. Reconstructing training data with informed adversaries. In 2022 IEEE Symposium on Security and Privacy (SP), pages 1138–1156. IEEE, 2022.
  6. Procedural image programs for representation learning. Advances in Neural Information Processing Systems, 35:6450–6462, 2022.
  7. Augmenting clip with improved visio-linguistic reasoning. arXiv preprint arXiv:2307.09233, 2023.
  8. Unlocking accuracy and fairness in differentially private image classification. arXiv preprint arXiv:2308.10888, 2023.
  9. On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258, 2021.
  10. Language models are few-shot learners. arXiv preprint arXiv:2005.14165, 2020.
  11. Concentrated differential privacy: Simplifications, extensions, and lower bounds. In Theory of Cryptography Conference, pages 635–658. Springer, 2016.
  12. Membership inference attacks from first principles. arXiv preprint arXiv:2112.03570, 2021.
  13. Conceptual 12m: Pushing web-scale image-text pre-training to recognize long-tail visual concepts. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3558–3568, 2021.
  14. A simple framework for contrastive learning of visual representations. CoRR, abs/2002.05709, 2020. https://arxiv.org/abs/2002.05709.
  15. Unlocking high-accuracy differentially private image classification through scale, 2022. https://arxiv.org/abs/2204.13650.
  16. Imagenet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 248–255, 2009a. 10.1109/CVPR.2009.5206848.
  17. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009b.
  18. Virtex: Learning visual representations from textual annotations. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11162–11173, 2021.
  19. Concentrated differential privacy. arXiv preprint arXiv:1603.01887, 2016.
  20. Calibrating noise to sensitivity in private data analysis. In Proceedings of the Third Conference on Theory of Cryptography, page 265–284, 2006.
  21. Datacomp: In search of the next generation of multimodal datasets. arXiv preprint arXiv:2304.14108, 2023.
  22. Numerical composition of differential privacy. Advances in Neural Information Processing Systems, 34:11631–11642, 2021.
  23. Bounding training data reconstruction in private (deep) learning. In International Conference on Machine Learning, pages 8056–8071. PMLR, 2022.
  24. Analyzing privacy leakage in machine learning via multiple hypothesis testing: A lesson from fano. In International Conference on Machine Learning, pages 11998–12011. PMLR, 2023.
  25. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16000–16009, 2022.
  26. Evaluating differentially private machine learning in practice. In 28th USENIX Security Symposium (USENIX Security 19), pages 1895–1912, 2019.
  27. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  28. Learning multiple layers of features from tiny images. 2009.
  29. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 25, 2012.
  30. Toward training at imagenet scale with differential privacy. arXiv preprint arXiv:2201.12328, 2022.
  31. Scaling up differentially private deep learning with fast per-example gradient clipping. arXiv preprint arXiv:2009.03106, 2020.
  32. Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation. In International Conference on Machine Learning, pages 12888–12900. PMLR, 2022.
  33. Large language models can be strong differentially private learners, 2021.
  34. Scaling language-image pre-training via masking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 23390–23400, 2023.
  35. Microsoft coco: Common objects in context, 2015.
  36. Private selection from private candidates. In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing (STOC), pages 298–309, 2019.
  37. Decoupled weight decay regularization. In International Conference on Learning Representations, 2018.
  38. Julien Mairal. Cyanure: An open-source toolbox for empirical risk minimization for python, c++, and soon more. arXiv preprint arXiv:1912.08165, 2019.
  39. Learning differentially private recurrent language models. arXiv preprint arXiv:1710.06963, 2017.
  40. Ilya Mironov. Rényi differential privacy. In 2017 IEEE 30th Computer Security Foundations Symposium (CSF), pages 263–275. IEEE, 2017.
  41. Rényi differential privacy of the Sampled Gaussian Mechanism. arXiv preprint arXiv:1908.10530, 2019.
  42. Slip: Self-supervision meets language-image pre-training. In European Conference on Computer Vision, pages 529–544. Springer, 2022.
  43. Flickr30k entities: Collecting region-to-phrase correspondences for richer image-to-sentence models, 2016.
  44. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
  45. Alfréd Rényi. On measures of entropy and information. In Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Contributions to the Theory of Statistics, volume 4, pages 547–562. University of California Press, 1961.
  46. Imagenet large scale visual recognition challenge, 2014. https://arxiv.org/abs/1409.0575.
  47. Tan without a burn: Scaling laws of dp-sgd. In International Conference on Machine Learning, pages 29937–29949. PMLR, 2023.
  48. Implicit bias in noisy-sgd: With applications to differentially private training, 2024.
  49. Learning visual representations with caption annotations. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part VIII 16, pages 153–170. Springer, 2020.
  50. Laion-400m: Open dataset of clip-filtered 400 million image-text pairs. arXiv preprint arXiv:2111.02114, 2021.
  51. Laion-5b: An open large-scale dataset for training next generation image-text models. ArXiv, abs/2210.08402, 2022. https://api.semanticscholar.org/CorpusID:252917726.
  52. Membership inference attacks against machine learning models. In 2017 IEEE Symposium on Security and Privacy (S&P), pages 3–18, 2017.
  53. Stochastic gradient descent with differentially private updates. In IEEE Global Conference on Signal and Information Processing (GlobalSIP), pages 245–248, 2013.
  54. A fistful of words: Learning transferable visual models from bag-of-words supervision. arXiv preprint arXiv:2112.13884, 2021.
  55. Deit iii: Revenge of the vit. In European Conference on Computer Vision, pages 516–533. Springer, 2022.
  56. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
  57. Differentially private learning needs better features (or much more data). 2020.
  58. Image captioners are scalable vision learners too. arXiv preprint arXiv:2306.07915, 2023.
  59. Benchmarking representation learning for natural world image collections. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12884–12893, 2021.
  60. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  61. Cider: Consensus-based image description evaluation, 2015.
  62. Subsampled rényi differential privacy and analytical moments accountant. In The 22nd International Conference on Artificial Intelligence and Statistics, pages 1226–1235. PMLR, 2019.
  63. A study of face obfuscation in imagenet. CoRR, abs/2103.06191, 2021. https://arxiv.org/abs/2103.06191.
  64. Scaling SGD batch size to 32k for imagenet training. CoRR, abs/1708.03888, 2017. http://arxiv.org/abs/1708.03888.
  65. Opacus: User-friendly differential privacy library in PyTorch, 2021.
  66. Differentially private fine-tuning of language models, 2021.
  67. Coca: Contrastive captioners are image-text foundation models. arXiv preprint arXiv:2205.01917, 2022.
  68. Vip: A differentially private foundation model for computer vision. arXiv preprint arXiv:2306.08842, 2023.
  69. When and why vision-language models behave like bags-of-words, and what to do about it? In The Eleventh International Conference on Learning Representations, 2022.
  70. Learning deep features for scene recognition using places database. Advances in neural information processing systems, 27, 2014.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com