Emergent Mind

Abstract

Recently, a series of diffusion-aware distillation algorithms have emerged to alleviate the computational overhead associated with the multi-step inference process of Diffusion Models (DMs). Current distillation techniques often dichotomize into two distinct aspects: i) ODE Trajectory Preservation; and ii) ODE Trajectory Reformulation. However, these approaches suffer from severe performance degradation or domain shifts. To address these limitations, we propose Hyper-SD, a novel framework that synergistically amalgamates the advantages of ODE Trajectory Preservation and Reformulation, while maintaining near-lossless performance during step compression. Firstly, we introduce Trajectory Segmented Consistency Distillation to progressively perform consistent distillation within pre-defined time-step segments, which facilitates the preservation of the original ODE trajectory from a higher-order perspective. Secondly, we incorporate human feedback learning to boost the performance of the model in a low-step regime and mitigate the performance loss incurred by the distillation process. Thirdly, we integrate score distillation to further improve the low-step generation capability of the model and offer the first attempt to leverage a unified LoRA to support the inference process at all steps. Extensive experiments and user studies demonstrate that Hyper-SD achieves SOTA performance from 1 to 8 inference steps for both SDXL and SD1.5. For example, Hyper-SDXL surpasses SDXL-Lightning by +0.68 in CLIP Score and +0.51 in Aes Score in the 1-step inference.

Comparative analysis of SDXL architecture with UNet-based methods in qualitative terms.

Overview

  • Hyper-SD introduces a novel framework combining trajectory-preservation and trajectory-reformulation in diffusion models, utilizing techniques like TSCD and human feedback learning to enhance efficiency and output quality.

  • The methodology includes three primary enhancements: Trajectory Segmented Consistency Distillation (TSCD), Human Feedback Learning, and One-step Generation Enhancement via Score Distillation.

  • Experimental results demonstrate Hyper-SD's superior performance in aesthetic quality and textual fidelity, outperforming existing methods across various diffusion model architectures.

  • Future works include maintaining Classifier Free Guidance and customizing feedback optimization for accelerated model performance, indicating potential for significant real-world applications.

Enhancing Diffusion Model Step Efficiency through Hyper-SD, a Novel Distillation Framework

Overview of Hyper-SD

Hyper-SD introduces a novel approach that amalgamates both trajectory-preservation and trajectory-reformulation techniques within diffusion models (DMs). This unified framework leverages trajectory segmented consistency distillation (TSCD), human feedback learning, and score distillation to achieve state-of-the-art (SOTA) performances on stable-diffusion models like SDXL and SD1.5 over a reduced number of inference steps, ranging from 1 to 8.

Methodology

Hyper-SD's methodology centers on three primary enhancements to the diffusion model distillation process:

Trajectory Segmented Consistency Distillation (TSCD):

  • The proposed TSCD divides the diffusion trajectory into smaller segments, facilitating a more granular and effective distillation process.
  • This approach minimizes model fitting complexity, mitigating the degradation in generation quality and preserving the fidelity of the original model's trajectory across various segments.

Human Feedback Learning:

  • This involves adjusting model outputs based on human aesthetic preferences and the feedback from visual perceptual models to improve the generation quality,
  • The implementation uses aesthetic predictors and instance segmentation models to refine structure and aesthetic appeal, guiding the model toward producing visually pleasing and structurally coherent outputs.

Score Distillation for One-step Generation Enhancement:

  • Incorporates a Distribution Matching Distillation (DMD) technique targeting enhancements specifically for one-step inference, optimizing the estimation of the score function and thus improving generation quality from minimal inference steps.

Experimental Results

Extensive experiments and a user study were conducted, showing that Hyper-SD achieves superior performance in both aesthetic quality and textual fidelity across different diffusion model architectures:

  • Metrics Utilized: CLIP Score, Aesthetic Score, and specialized metrics such as ImageReward and Pickscore were used to quantitatively assess performance.
  • Comparison to Baselines: Hyper-SD displayed noticeable improvements over existing methods like SDXL-Lightning and various adversarial and trajectory-based distillation techniques.
  • User Study Findings: Hyper-SD was preferred significantly more often compared to other methods, reinforcing the effectiveness of the proposed enhancements.

Implications and Future Work

The practical implications of Hyper-SD are profound for real-world applications requiring efficient and high-quality image generation from textual prompts. The ability to operate effectively across a reduced number of inference steps without compromising output quality can lead to more resource-efficient deployments of generative models.

Looking ahead, future developments might focus on:

  • Maintaining Classifier Free Guidance (CFG): Ensuring the model can utilize negative prompts effectively while still functioning under accelerated conditions.
  • Custom Feedback Optimization: Tailoring feedback learning mechanisms specifically for accelerated models to enhance performance further.

Conclusion

Hyper-SD marks a significant advance in the field of generative AI, particularly in the optimization of diffusion models for fewer-step inference with high fidelity and aesthetic quality. It sets a new standard for efficiency in model performance, paving the way for both academic exploration and practical applications in AI-driven image generation.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.

YouTube