Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
194 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Hiding and Recovering Knowledge in Text-to-Image Diffusion Models via Learnable Prompts (2403.12326v3)

Published 18 Mar 2024 in cs.LG and cs.CV

Abstract: Diffusion models have demonstrated remarkable capability in generating high-quality visual content from textual descriptions. However, since these models are trained on large-scale internet data, they inevitably learn undesirable concepts, such as sensitive content, copyrighted material, and harmful or unethical elements. While previous works focus on permanently removing such concepts, this approach is often impractical, as it can degrade model performance and lead to irreversible loss of information. In this work, we introduce a novel concept-hiding approach that makes unwanted concepts inaccessible to public users while allowing controlled recovery when needed. Instead of erasing knowledge from the model entirely, we incorporate a learnable prompt into the cross-attention module, acting as a secure memory that suppresses the generation of hidden concepts unless a secret key is provided. This enables flexible access control -- ensuring that undesirable content cannot be easily generated while preserving the option to reinstate it under restricted conditions. Our method introduces a new paradigm where concept suppression and controlled recovery coexist, which was not feasible in prior works. We validate its effectiveness on the Stable Diffusion model, demonstrating that hiding concepts mitigate the risks of permanent removal while maintaining the model's overall capability.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (31)
  1. Decision-based adversarial attacks: Reliable attacks against black-box machine learning models. arXiv preprint arXiv:1712.04248, 2017.
  2. Imagenet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp.  248–255, 2009. doi: 10.1109/CVPR.2009.5206848.
  3. An image is worth one word: Personalizing text-to-image generation using textual inversion. arXiv preprint arXiv:2208.01618, 2022.
  4. Unified concept editing in diffusion models. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp.  5111–5120, 2024.
  5. Rohit Gandikota et al. Erasing concepts from diffusion models. ICCV, 2023.
  6. Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.  770–778, 2016. doi: 10.1109/CVPR.2016.90.
  7. Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.
  8. Lora: Low-rank adaptation of large language models. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net, 2022. URL https://openreview.net/forum?id=nZeVKeeFYf9.
  9. Ablating concepts in text-to-image diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.  22691–22702, 2023.
  10. The power of scale for parameter-efficient prompt tuning. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021. doi: 10.18653/v1/2021.emnlp-main.243.
  11. Prefix-tuning: Optimizing continuous prompts for generation. CoRR, abs/2101.00190, 2021.
  12. Editing implicit assumptions in text-to-image diffusion models. In IEEE International Conference on Computer Vision, ICCV 2023, Paris, France, October 1-6, 2023, pp.  7030–7038. IEEE, 2023. doi: 10.1109/ICCV51070.2023.00649.
  13. Adapterfusion: Non-destructive task composition for transfer learning. CoRR, abs/2005.00247, 2020.
  14. Bedapudi Praneet. Nudenet: Neural nets for nudity classification, detection and selective censorin. 2019.
  15. Unsafe diffusion: On the generation of unsafe images and hateful memes from text-to-image models. arXiv preprint arXiv:2305.13873, 2023.
  16. Zero-shot text-to-image generation. In International Conference on Machine Learning, pp.  8821–8831. PMLR, 2021.
  17. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 1(2):3, 2022.
  18. Javier Rando et al. Red-teaming the stable diffusion safety filter. NeurIPS Workshop MLSW, 2022.
  19. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  10684–10695, 2022.
  20. Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  22500–22510, 2023.
  21. Patrick Schramowski et al. Safe latent diffusion: Mitigating inappropriate degeneration in diffusion models. In CVPR, 2023.
  22. Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456, 2020.
  23. StabilityAI. Stable diffusion 2.0 release. 2022. URL https://stability.ai/blog/stable-diffusion-v2-release.
  24. What the daam: Interpreting stable diffusion using cross attention. arXiv preprint arXiv:2210.04885, 2022.
  25. Anti-dreambooth: Protecting users from personalized text-to-image synthesis. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.  2116–2127, 2023.
  26. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  27. K-adapter: Infusing knowledge into pre-trained models with adapters, 2021.
  28. Mika Westerlund. The emergence of deepfake technology: A review. Technology innovation management review, 9(11), 2019.
  29. Sneakyprompt: Jailbreaking text-to-image generative models. In 2024 IEEE Symposium on Security and Privacy (SP), pp.  123–123. IEEE Computer Society, 2024.
  30. Eric Zhang et al. Forget-me-not: Learning to forget in text-to-image diffusion models. arXiv preprint arXiv:2303.17591, 2023.
  31. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  586–595, 2018.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets