2000 character limit reached
SDXL-Lightning: Progressive Adversarial Diffusion Distillation (2402.13929v3)
Published 21 Feb 2024 in cs.CV, cs.AI, and cs.LG
Abstract: We propose a diffusion distillation method that achieves new state-of-the-art in one-step/few-step 1024px text-to-image generation based on SDXL. Our method combines progressive and adversarial distillation to achieve a balance between quality and mode coverage. In this paper, we discuss the theoretical analysis, discriminator design, model formulation, and training techniques. We open-source our distilled SDXL-Lightning models both as LoRA and full UNet weights.
- AAM-XL Anime Mix. https://civitai.com/models/269232.
- Stable video diffusion: Scaling latent video diffusion models to large datasets, 2023.
- Align your latents: High-resolution video synthesis with latent diffusion models, 2023.
- Coyo-700m: Image-text pair dataset. https://github.com/kakaobrain/coyo-dataset, 2022.
- Pixart-α𝛼\alphaitalic_α: Fast training of diffusion transformer for photorealistic text-to-image synthesis, 2023.
- Tri Dao. Flashattention-2: Faster attention with better parallelism and work partitioning, 2023.
- Flashattention: Fast and memory-efficient exact attention with io-awareness, 2022.
- Generative adversarial networks, 2014.
- Smooth diffusion: Crafting smooth latent spaces in diffusion models, 2023.
- Animatediff: Animate your personalized text-to-image diffusion models without specific tuning, 2023.
- Gaussian error linear units (gelus), 2023.
- Gans trained by a two time-scale update rule converge to a local nash equilibrium. In Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett, editors, Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, pages 6626–6637, 2017.
- Imagen video: High definition video generation with diffusion models, 2022.
- Denoising diffusion probabilistic models. In Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin, editors, Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020.
- Classifier-free diffusion guidance, 2022.
- Lora: Low-rank adaptation of large language models. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net, 2022.
- Scaling up gans for text-to-image synthesis, 2023.
- MSG-GAN: multi-scale gradients for generative adversarial networks. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020, pages 7796–7805. IEEE, 2020.
- Elucidating the design space of diffusion-based generative models, 2022.
- Training generative adversarial networks with limited data. In Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin, editors, Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020.
- Analyzing and improving the image quality of stylegan. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020, pages 8107–8116. IEEE, 2020.
- Consistency trajectory models: Learning probability flow ode trajectory of diffusion, 2023.
- The lipschitz constant of self-attention. In Marina Meila and Tong Zhang, editors, Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event, volume 139 of Proceedings of Machine Learning Research, pages 5562–5571. PMLR, 2021.
- Adam: A method for stochastic optimization. In Yoshua Bengio and Yann LeCun, editors, 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015.
- Auto-encoding variational bayes. In Yoshua Bengio and Yann LeCun, editors, 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings, 2014.
- Common diffusion noise schedules and sample steps are flawed, 2023.
- Diffusion model with perceptual loss, 2024.
- Microsoft coco: Common objects in context, 2015.
- Flow matching for generative modeling, 2023.
- Pseudo numerical methods for diffusion models on manifolds. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net, 2022.
- Flow straight and fast: Learning to generate and transfer data with rectified flow, 2022.
- Instaflow: One step is enough for high-quality diffusion-based text-to-image generation, 2023.
- Decoupled weight decay regularization. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, 2019.
- Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps, 2022.
- Dpm-solver++: Fast solver for guided sampling of diffusion probabilistic models, 2023.
- Latent consistency models: Synthesizing high-resolution images with few-step inference, 2023.
- Lcm-lora: A universal stable-diffusion acceleration module, 2023.
- Sdedit: Guided image synthesis and editing with stochastic differential equations. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net, 2022.
- Which training methods for gans do actually converge? In Jennifer G. Dy and Andreas Krause, editors, Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018, volume 80 of Proceedings of Machine Learning Research, pages 3478–3487. PMLR, 2018.
- Mixed precision training. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, 2018.
- Dinov2: Learning robust visual features without supervision, 2023.
- On aliased resizing and surprising subtleties in gan evaluation, 2022.
- Wuerstchen: An efficient architecture for large-scale text-to-image diffusion models, 2023.
- Sdxl: Improving latent diffusion models for high-resolution image synthesis, 2023.
- Dreamfusion: Text-to-3d using 2d diffusion, 2022.
- Learning transferable visual models from natural language supervision. In Marina Meila and Tong Zhang, editors, Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event, volume 139 of Proceedings of Machine Learning Research, pages 8748–8763. PMLR, 2021.
- Zero: Memory optimizations toward training trillion parameter models, 2020.
- Searching for activation functions, 2017.
- Hierarchical text-conditional image generation with clip latents, 2022.
- RealVisXL V4.0. https://civitai.com/models/139562.
- High-resolution image synthesis with latent diffusion models, 2022.
- U-net: Convolutional networks for biomedical image segmentation, 2015.
- Photorealistic text-to-image diffusion models with deep language understanding, 2022.
- Progressive distillation for fast sampling of diffusion models. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net, 2022.
- Samaritan 3D Cartoon V4. https://civitai.com/models/81270.
- Projected gans converge faster. In Marc’Aurelio Ranzato, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, and Jennifer Wortman Vaughan, editors, Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, pages 17480–17492, 2021.
- Stylegan-t: Unlocking the power of gans for fast large-scale text-to-image synthesis, 2023.
- Adversarial diffusion distillation, 2023.
- Laion-5b: An open large-scale dataset for training next generation image-text models, 2022.
- SDXL-ControlNet Canny. https://huggingface.co/diffusers/controlnet-canny-sdxl-1.0.
- SDXL-ControlNet Depth. https://huggingface.co/diffusers/controlnet-depth-sdxl-1.0.
- Make-a-video: Text-to-video generation without text-video data, 2022.
- Deep unsupervised learning using nonequilibrium thermodynamics. In Francis R. Bach and David M. Blei, editors, Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015, volume 37 of JMLR Workshop and Conference Proceedings, pages 2256–2265. JMLR.org, 2015.
- Denoising diffusion implicit models. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net, 2021.
- Improved techniques for training consistency models, 2023.
- Consistency models, 2023.
- Score-based generative modeling through stochastic differential equations. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net, 2021.
- Rethinking the inception architecture for computer vision. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pages 2818–2826. IEEE Computer Society, 2016.
- Diffusers: State-of-the-art diffusion models. https://github.com/huggingface/diffusers, 2022.
- Prolificdreamer: High-fidelity and diverse text-to-3d generation with variational score distillation, 2023.
- Group normalization, 2018.
- Tackling the generative learning trilemma with denoising diffusion gans. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net, 2022.
- Ufogen: You forward once large scale text-to-image generation via diffusion gans, 2023.
- Ip-adapter: Text compatible image prompt adapter for text-to-image diffusion models, 2023.
- One-step diffusion with distribution matching distillation, 2023.
- Adding conditional control to text-to-image diffusion models, 2023.
- The unreasonable effectiveness of deep features as a perceptual metric. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pages 586–595. IEEE Computer Society, 2018.
- Unipc: A unified predictor-corrector framework for fast sampling of diffusion models, 2023.
- Movq: Modulating quantized vectors for high-fidelity image generation, 2022.
- Magicvideo: Efficient video generation with latent diffusion models, 2023.