Ablating Concepts in Text-to-Image Diffusion Models (2303.13516v3)

Published 23 Mar 2023 in cs.CV, cs.GR, and cs.LG

Abstract: Large-scale text-to-image diffusion models can generate high-fidelity images with powerful compositional ability. However, these models are typically trained on an enormous amount of Internet data, often containing copyrighted material, licensed images, and personal photos. Furthermore, they have been found to replicate the style of various living artists or memorize exact training samples. How can we remove such copyrighted concepts or images without retraining the model from scratch? To achieve this goal, we propose an efficient method of ablating concepts in the pretrained model, i.e., preventing the generation of a target concept. Our algorithm learns to match the image distribution for a target style, instance, or text prompt we wish to ablate to the distribution corresponding to an anchor concept. This prevents the model from generating target concepts given its text condition. Extensive experiments show that our method can successfully prevent the generation of the ablated concept while preserving closely related concepts in the model.

Citations (133)

View on Semantic Scholar

Summary

The paper introduces a novel method to ablate undesired concepts by aligning target distribution with an anchor concept, preserving related image features.
The approach efficiently updates the model in about five minutes per concept without full retraining, ensuring minimal disruption to similar concepts.
Extensive experiments over 16 tasks validate the method using a KL divergence fine-tuning objective, demonstrating robust control over generated imagery.

Model-based Concept Ablation

Overview

A recently proposed method offers a solution to modifying pre-trained text-to-image diffusion models to ablate (remove) undesired concepts such as copyrighted material, memorized images, or specific art styles. The technique focuses on altering the conditional distribution of the model to shift the generation of a particular target concept to an anchor concept. For example, generating generic cat images instead of "Grumpy Cat" when prompted. Significantly, this approach retains closely related concepts within the model's generations.

Ablating Concepts Efficiently

The primary challenge addressed by this method is efficiently preventing a diffusion model from generating specific concepts without retraining from scratch or losing related concepts. This is tackled by aligning the target concept image distribution - which is to be ablated - with the distribution of an anchor concept. Two strategies are developed: one where the anchor distribution is based on the pretrained model's output for the anchor concept, and another where the anchor distribution is induced by realigning the target concept prompt with images of the anchor concept.

Empirical Validation

Extensive experiments validate the effectiveness of the proposed method across 16 ablation tasks, showing strong numerical evidence of the ability to obviate target concepts – which included specific objects, styles, and memorized images. For instance, the method successfully trained to map "Grumpy Cat" to the generic "Cat" category with minimal effect on the model's ability to produce related cat breeds. The ablation process is impressively efficient, requiring only about five minutes per concept to update the model's weights.

Theoretical Underpinning and Ablation Study

Under the hood, the work derives a Kullback–Leibler divergence-based objective that leverages weight fine-tuning rather than full model retraining. Moreover, the paper meticulously discusses training objectives, parameter subset choices for fine-tuning, and robustness issues, such as the model's sensitivity to misspelled prompts, reporting that fine-tuning cross-attention layers offers robustness to spelling variations compared to embedding layers alone.

Overall, the paper’s contribution implies a significant stride in the domain of generative AI, allowing a higher degree of control and ethical governance over diffusion model outputs. The code and models have been made available, providing the community with tools to apply these methods to their work.