On Memorization in Diffusion Models (2310.02664v1)
Abstract: Due to their capacity to generate novel and high-quality samples, diffusion models have attracted significant research interest in recent years. Notably, the typical training objective of diffusion models, i.e., denoising score matching, has a closed-form optimal solution that can only generate training data replicating samples. This indicates that a memorization behavior is theoretically expected, which contradicts the common generalization ability of state-of-the-art diffusion models, and thus calls for a deeper understanding. Looking into this, we first observe that memorization behaviors tend to occur on smaller-sized datasets, which motivates our definition of effective model memorization (EMM), a metric measuring the maximum size of training data at which a learned diffusion model approximates its theoretical optimum. Then, we quantify the impact of the influential factors on these memorization behaviors in terms of EMM, focusing primarily on data distribution, model configuration, and training procedure. Besides comprehensive empirical results identifying the influential factors, we surprisingly find that conditioning training data on uninformative random labels can significantly trigger the memorization in diffusion models. Our study holds practical significance for diffusion model users and offers clues to theoretical research in deep generative models. Code is available at https://github.com/sail-sg/DiffMemorize.
- Deep learning through the lens of example difficulty. Advances in Neural Information Processing Systems (NeurIPS), 34:10876–10889, 2021.
- Benign overfitting in linear regression. Proceedings of the National Academy of Sciences, 117(48):30063–30070, 2020.
- Extracting training data from large language models. In 30th USENIX Security Symposium (USENIX Security 21), pp. 2633–2650, 2021.
- Quantifying memorization across neural language models. arXiv preprint arXiv:2202.07646, 2022.
- Extracting training data from diffusion models. arXiv preprint arXiv:2301.13188, 2023.
- Imagenet: A large-scale hierarchical image database. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009.
- Diffusion models beat gans on image synthesis. In Advances in Neural Information Processing Systems (NeurIPS), 2021.
- Vitaly Feldman. Does learning require memorization? a short tale about a long tail. In Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing, pp. 954–959, 2020.
- What neural networks memorize and why: Discovering the long tail via influence estimation. Advances in Neural Information Processing Systems (NeurIPS), 33:2881–2891, 2020.
- When do gans replicate? on the choice of dataset size. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 6701–6710, 2021.
- Accurate, large minibatch sgd: Training imagenet in 1 hour. arXiv preprint arXiv:1706.02677, 2017.
- Gans trained by a two time-scale update rule converge to a local nash equilibrium. In Advances in Neural Information Processing Systems (NeurIPS), pp. 6626–6637, 2017.
- Denoising diffusion probabilistic models. In Advances in Neural Information Processing Systems (NeurIPS), 2020.
- Make-an-audio: Text-to-audio generation with prompt-enhanced diffusion models. arXiv preprint arXiv:2301.12661, 2023.
- Elucidating the design space of diffusion-based generative models. Advances in Neural Information Processing Systems (NeurIPS), 35:26565–26577, 2022.
- On large-batch training for deep learning: Generalization gap and sharp minima. arXiv preprint arXiv:1609.04836, 2016.
- Guided-tts: A diffusion model for text-to-speech via classifier guidance. In International Conference on Machine Learning (ICML), pp. 11119–11133. PMLR, 2022.
- Variational diffusion models. In Advances in Neural Information Processing Systems (NeurIPS), 2021.
- Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- Understanding the diffusion objective as a weighted integral of elbos. arXiv preprint arXiv:2303.00848, 2023.
- Magic3d: High-resolution text-to-3d content creation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 300–309, 2023.
- Can neural network memorization be localized? arXiv preprint arXiv:2307.09542, 2023.
- Deep double descent: Where bigger models and more data hurt. In International Conference on Learning Representations (ICLR), 2020.
- Glide: Towards photorealistic image generation and editing with text-guided diffusion models. arXiv preprint arXiv:2112.10741, 2021.
- Improved denoising diffusion probabilistic models. In International Conference on Machine Learning (ICML), pp. 8162–8171. PMLR, 2021.
- Dreamfusion: Text-to-3d using 2d diffusion. In International Conference on Learning Representations (ICLR), 2023.
- Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 2022.
- High-resolution image synthesis with latent diffusion models. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
- U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), pp. 234–241. Springer, 2015.
- Deep unsupervised learning using nonequilibrium thermodynamics. In International Conference on Machine Learning (ICML), pp. 2256–2265. PMLR, 2015.
- Diffusion art or digital forgery? investigating data replication in diffusion models. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6048–6058, 2023a.
- Understanding and mitigating copying in diffusion models. arXiv preprint arXiv:2305.20086, 2023b.
- Generative modeling by estimating gradients of the data distribution. In Advances in Neural Information Processing Systems (NeurIPS), pp. 11895–11907, 2019.
- Improved techniques for training score-based generative models. In Advances in Neural Information Processing Systems (NeurIPS), 2020.
- Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations (ICLR), 2021.
- On the geometry of generalization and memorization in deep neural networks. arXiv preprint arXiv:2105.14602, 2021.
- Rethinking the inception architecture for computer vision. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2818–2826, 2016.
- Fourier features let networks learn high frequency functions in low dimensional domains. Advances in Neural Information Processing Systems (NeurIPS), 33:7537–7547, 2020.
- Gerrit van den Burg and Chris Williams. On memorization in probabilistic deep generative models. Advances in Neural Information Processing Systems (NeurIPS), 34:27916–27928, 2021.
- Attention is all you need. Advances in neural information processing systems (NeurIPS), 30, 2017.
- Digress: Discrete denoising diffusion for graph generation. In International Conference on Learning Representations (ICLR), 2022.
- Pascal Vincent. A connection between score matching and denoising autoencoders. Neural computation, 23(7):1661–1674, 2011.
- This person (probably) exists. identity membership attacks against gan generated faces. arXiv preprint arXiv:2107.06018, 2021.
- Geodiff: A geometric diffusion model for molecular conformation generation. In International Conference on Learning Representations (ICLR), 2022.
- On the generalization of diffusion model. arXiv preprint arXiv:2305.14712, 2023.
- Diffusion probabilistic models generalize when they fail to memorize. In ICML 2023 Workshop on Structured Probabilistic Inference & Generative Modeling, 2023.
- Understanding deep learning requires rethinking generalization. In International Conference on Learning Representations (ICLR), 2017.
- A recipe for watermarking diffusion models. arXiv preprint arXiv:2303.10137, 2023.