Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 62 tok/s
Gemini 2.5 Pro 48 tok/s Pro
GPT-5 Medium 14 tok/s Pro
GPT-5 High 13 tok/s Pro
GPT-4o 93 tok/s Pro
Kimi K2 213 tok/s Pro
GPT OSS 120B 458 tok/s Pro
Claude Sonnet 4 38 tok/s Pro
2000 character limit reached

Progressive Knowledge Distillation Of Stable Diffusion XL Using Layer Level Loss (2401.02677v1)

Published 5 Jan 2024 in cs.CV and cs.AI

Abstract: Stable Diffusion XL (SDXL) has become the best open source text-to-image model (T2I) for its versatility and top-notch image quality. Efficiently addressing the computational demands of SDXL models is crucial for wider reach and applicability. In this work, we introduce two scaled-down variants, Segmind Stable Diffusion (SSD-1B) and Segmind-Vega, with 1.3B and 0.74B parameter UNets, respectively, achieved through progressive removal using layer-level losses focusing on reducing the model size while preserving generative quality. We release these models weights at https://hf.co/Segmind. Our methodology involves the elimination of residual networks and transformer blocks from the U-Net structure of SDXL, resulting in significant reductions in parameters, and latency. Our compact models effectively emulate the original SDXL by capitalizing on transferred knowledge, achieving competitive results against larger multi-billion parameter SDXL. Our work underscores the efficacy of knowledge distillation coupled with layer-level losses in reducing model size while preserving the high-quality generative capabilities of SDXL, thus facilitating more accessible deployment in resource-constrained environments.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (14)
  1. Deep residual learning for image recognition, 2015.
  2. Bk-sdm: A lightweight, fast, and cheap version of stable diffusion, 2023.
  3. Adam: A method for stochastic optimization, 2017.
  4. Snapfusion: Text-to-image diffusion model on mobile devices within two seconds, 2023.
  5. Latent consistency models: Synthesizing high-resolution images with few-step inference, 2023a.
  6. Lcm-lora: A universal stable-diffusion acceleration module, 2023b.
  7. Kosmos-2: Grounding multimodal large language models to the world. ArXiv, abs/2306.14824, 2023.
  8. Wuerstchen: An efficient architecture for large-scale text-to-image diffusion models, 2023.
  9. Sdxl: Improving latent diffusion models for high-resolution image synthesis, 2023.
  10. Zero-shot text-to-image generation, 2021.
  11. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.  10684–10695, June 2022.
  12. Photorealistic text-to-image diffusion models with deep language understanding, 2022.
  13. Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter, 2020.
  14. If by deepfloyd lab at stabilityai, 2023.
Citations (21)

Summary

  • The paper introduces a progressive knowledge distillation method that compresses SDXL by strategically removing U-Net layers using layer-level loss.
  • Researchers demonstrate that distilled variants like SSD-1B and Segmind-Vega closely match the original model’s performance with faster inference.
  • Evaluations, including human preference studies, validate the compressed models' effectiveness and potential for broad AI applications.

Introduction to Model Compression

Stable Diffusion XL (SDXL) is a state-of-the-art text-to-image model greatly admired for its image generation capabilities. However, due to its large size, the model demands considerable computational resources, which can be a barrier for many users. The paper presents an innovative approach to model compression that introduces scaled-down variants of SDXL, called Segmind Stable Diffusion (SSD-1B) and Segmind-Vega. These variants are designed with fewer parameters, aiming to deliver similar performance while enhancing accessibility and reducing computational load.

Knowledge Distillation Approach

The core of this model compression lies in knowledge distillation, a process where a smaller model (student) learns to replicate the performance of a larger model (teacher). The authors achieved size reduction by eliminating certain layers within SDXL's U-Net architecture, focusing on residual networks and transformer blocks that account for substantial parameters. This eliminates redundancy without compromising on image quality. The paper also showcases how these technique preserves the high-quality generative capabilities of the original SDXL. The reduced-sized models, released on popular machine learning platforms, illustrate the successful application of knowledge distillation at the layer level.

Efficient Diffusion Models and Training

Investigating the efficient adaptation of diffusion models, the researchers adopted a methodical pruning strategy, rigorously evaluating which layers can be omitted. They chose layers whose absence had minimal impact on image generation quality, confirmed through both human evaluation and heuristic methods. Training details reveal that models were optimized for high resolution imagery and were trained using mixed-precision on powerful GPUs, showcasing the intensive computational effort involved. Even so, the compression methods employed dramatically decreased both the training steps and resources needed.

Evaluation and Implications

Comparative evaluations highlight the potential of model compression. SSD-1B and Segmind-Vega performed impressively, benchmarking close to the larger SDXL’s output with significantly faster inference times. The validity of these findings was reinforced by a comprehensive human preference paper, where the distilled SSD-1B model was even slightly favored over SDXL. These conclusions not only underscore the feasibility of compressing complex generative models but also hint at the applicability of such methods across other large machine learning models. The paper concludes by recognizing the importance of the parent models in distillation and suggests possible future explorations into distilling other major AI models.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Lightbulb On Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube