Emergent Mind

Ablating Concepts in Text-to-Image Diffusion Models

(2303.13516)
Published Mar 23, 2023 in cs.CV , cs.GR , and cs.LG

Abstract

Large-scale text-to-image diffusion models can generate high-fidelity images with powerful compositional ability. However, these models are typically trained on an enormous amount of Internet data, often containing copyrighted material, licensed images, and personal photos. Furthermore, they have been found to replicate the style of various living artists or memorize exact training samples. How can we remove such copyrighted concepts or images without retraining the model from scratch? To achieve this goal, we propose an efficient method of ablating concepts in the pretrained model, i.e., preventing the generation of a target concept. Our algorithm learns to match the image distribution for a target style, instance, or text prompt we wish to ablate to the distribution corresponding to an anchor concept. This prevents the model from generating target concepts given its text condition. Extensive experiments show that our method can successfully prevent the generation of the ablated concept while preserving closely related concepts in the model.

Overview

  • The paper introduces a method for modifying text-to-image diffusion models to remove certain undesired concepts while preserving related ones.

  • The technique prevents models from generating specific content by shifting the target concept's distribution to an anchor concept's distribution.

  • Empirical evidence from 16 ablation tasks demonstrates the method's efficiency, taking only about five minutes per concept for model updates.

  • The work is underpinned by a Kullback–Leibler divergence-based objective and fine-tuning of model weights instead of full retraining.

  • The paper provides a new avenue for controlling generative AI output ethically and has made its code and models available for public use.

Model-based Concept Ablation

Overview

A recently proposed method offers a solution to modifying pre-trained text-to-image diffusion models to ablate (remove) undesired concepts such as copyrighted material, memorized images, or specific art styles. The technique focuses on altering the conditional distribution of the model to shift the generation of a particular target concept to an anchor concept. For example, generating generic cat images instead of "Grumpy Cat" when prompted. Significantly, this approach retains closely related concepts within the model's generations.

Ablating Concepts Efficiently

The primary challenge addressed by this method is efficiently preventing a diffusion model from generating specific concepts without retraining from scratch or losing related concepts. This is tackled by aligning the target concept image distribution - which is to be ablated - with the distribution of an anchor concept. Two strategies are developed: one where the anchor distribution is based on the pretrained model's output for the anchor concept, and another where the anchor distribution is induced by realigning the target concept prompt with images of the anchor concept.

Empirical Validation

Extensive experiments validate the effectiveness of the proposed method across 16 ablation tasks, showing strong numerical evidence of the ability to obviate target concepts – which included specific objects, styles, and memorized images. For instance, the method successfully trained to map "Grumpy Cat" to the generic "Cat" category with minimal effect on the model's ability to produce related cat breeds. The ablation process is impressively efficient, requiring only about five minutes per concept to update the model's weights.

Theoretical Underpinning and Ablation Study

Under the hood, the work derives a Kullback–Leibler divergence-based objective that leverages weight fine-tuning rather than full model retraining. Moreover, the paper meticulously discusses training objectives, parameter subset choices for fine-tuning, and robustness issues, such as the model's sensitivity to misspelled prompts, reporting that fine-tuning cross-attention layers offers robustness to spelling variations compared to embedding layers alone.

Overall, the paper’s contribution implies a significant stride in the domain of generative AI, allowing a higher degree of control and ethical governance over diffusion model outputs. The code and models have been made available, providing the community with tools to apply these methods to their work.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.