Hyper-SD: Trajectory Segmented Consistency Model for Efficient Image Synthesis
(2404.13686)Abstract
Recently, a series of diffusion-aware distillation algorithms have emerged to alleviate the computational overhead associated with the multi-step inference process of Diffusion Models (DMs). Current distillation techniques often dichotomize into two distinct aspects: i) ODE Trajectory Preservation; and ii) ODE Trajectory Reformulation. However, these approaches suffer from severe performance degradation or domain shifts. To address these limitations, we propose Hyper-SD, a novel framework that synergistically amalgamates the advantages of ODE Trajectory Preservation and Reformulation, while maintaining near-lossless performance during step compression. Firstly, we introduce Trajectory Segmented Consistency Distillation to progressively perform consistent distillation within pre-defined time-step segments, which facilitates the preservation of the original ODE trajectory from a higher-order perspective. Secondly, we incorporate human feedback learning to boost the performance of the model in a low-step regime and mitigate the performance loss incurred by the distillation process. Thirdly, we integrate score distillation to further improve the low-step generation capability of the model and offer the first attempt to leverage a unified LoRA to support the inference process at all steps. Extensive experiments and user studies demonstrate that Hyper-SD achieves SOTA performance from 1 to 8 inference steps for both SDXL and SD1.5. For example, Hyper-SDXL surpasses SDXL-Lightning by +0.68 in CLIP Score and +0.51 in Aes Score in the 1-step inference.
Overview
-
Hyper-SD introduces a novel framework combining trajectory-preservation and trajectory-reformulation in diffusion models, utilizing techniques like TSCD and human feedback learning to enhance efficiency and output quality.
-
The methodology includes three primary enhancements: Trajectory Segmented Consistency Distillation (TSCD), Human Feedback Learning, and One-step Generation Enhancement via Score Distillation.
-
Experimental results demonstrate Hyper-SD's superior performance in aesthetic quality and textual fidelity, outperforming existing methods across various diffusion model architectures.
-
Future works include maintaining Classifier Free Guidance and customizing feedback optimization for accelerated model performance, indicating potential for significant real-world applications.
Enhancing Diffusion Model Step Efficiency through Hyper-SD, a Novel Distillation Framework
Overview of Hyper-SD
Hyper-SD introduces a novel approach that amalgamates both trajectory-preservation and trajectory-reformulation techniques within diffusion models (DMs). This unified framework leverages trajectory segmented consistency distillation (TSCD), human feedback learning, and score distillation to achieve state-of-the-art (SOTA) performances on stable-diffusion models like SDXL and SD1.5 over a reduced number of inference steps, ranging from 1 to 8.
Methodology
Hyper-SD's methodology centers on three primary enhancements to the diffusion model distillation process:
Trajectory Segmented Consistency Distillation (TSCD):
- The proposed TSCD divides the diffusion trajectory into smaller segments, facilitating a more granular and effective distillation process.
- This approach minimizes model fitting complexity, mitigating the degradation in generation quality and preserving the fidelity of the original model's trajectory across various segments.
Human Feedback Learning:
- This involves adjusting model outputs based on human aesthetic preferences and the feedback from visual perceptual models to improve the generation quality,
- The implementation uses aesthetic predictors and instance segmentation models to refine structure and aesthetic appeal, guiding the model toward producing visually pleasing and structurally coherent outputs.
Score Distillation for One-step Generation Enhancement:
- Incorporates a Distribution Matching Distillation (DMD) technique targeting enhancements specifically for one-step inference, optimizing the estimation of the score function and thus improving generation quality from minimal inference steps.
Experimental Results
Extensive experiments and a user study were conducted, showing that Hyper-SD achieves superior performance in both aesthetic quality and textual fidelity across different diffusion model architectures:
- Metrics Utilized: CLIP Score, Aesthetic Score, and specialized metrics such as ImageReward and Pickscore were used to quantitatively assess performance.
- Comparison to Baselines: Hyper-SD displayed noticeable improvements over existing methods like SDXL-Lightning and various adversarial and trajectory-based distillation techniques.
- User Study Findings: Hyper-SD was preferred significantly more often compared to other methods, reinforcing the effectiveness of the proposed enhancements.
Implications and Future Work
The practical implications of Hyper-SD are profound for real-world applications requiring efficient and high-quality image generation from textual prompts. The ability to operate effectively across a reduced number of inference steps without compromising output quality can lead to more resource-efficient deployments of generative models.
Looking ahead, future developments might focus on:
- Maintaining Classifier Free Guidance (CFG): Ensuring the model can utilize negative prompts effectively while still functioning under accelerated conditions.
- Custom Feedback Optimization: Tailoring feedback learning mechanisms specifically for accelerated models to enhance performance further.
Conclusion
Hyper-SD marks a significant advance in the field of generative AI, particularly in the optimization of diffusion models for fewer-step inference with high fidelity and aesthetic quality. It sets a new standard for efficiency in model performance, paving the way for both academic exploration and practical applications in AI-driven image generation.
Create an account to read this summary for free: