Improving the Training of Rectified Flows (2405.20320v2)

Published 30 May 2024 in cs.CV, cs.AI, and cs.LG

Abstract: Diffusion models have shown great promise for image and video generation, but sampling from state-of-the-art models requires expensive numerical integration of a generative ODE. One approach for tackling this problem is rectified flows, which iteratively learn smooth ODE paths that are less susceptible to truncation error. However, rectified flows still require a relatively large number of function evaluations (NFEs). In this work, we propose improved techniques for training rectified flows, allowing them to compete with \emph{knowledge distillation} methods even in the low NFE setting. Our main insight is that under realistic settings, a single iteration of the Reflow algorithm for training rectified flows is sufficient to learn nearly straight trajectories; hence, the current practice of using multiple Reflow iterations is unnecessary. We thus propose techniques to improve one-round training of rectified flows, including a U-shaped timestep distribution and LPIPS-Huber premetric. With these techniques, we improve the FID of the previous 2-rectified flow by up to 75\% in the 1 NFE setting on CIFAR-10. On ImageNet 64$\times$64, our improved rectified flow outperforms the state-of-the-art distillation methods such as consistency distillation and progressive distillation in both one-step and two-step settings and rivals the performance of improved consistency training (iCT) in FID. Code is available at https://github.com/sangyun884/rfpp.

Citations (7)

View on Semantic Scholar

Summary

The paper shows that a single Reflow iteration can achieve near-straight trajectories, reducing computational burden compared to multi-step methods.
It introduces a U-shaped timestep distribution and LPIPS-Huber loss, leading to up to a 50% reduction in FID on key datasets.
Empirical results on CIFAR-10 and ImageNet 64×64 demonstrate that enhanced rectified flows yield competitive quality with significantly fewer function evaluations.

Improved Training Techniques for Rectified Flows: Enhancing Low NFE Performance

Rectified flows have emerged as a compelling alternative to diffusion models for image and video generation tasks, particularly when emphasis is placed on reducing the number of function evaluations (NFEs) required for generation. The paper under review introduces a suite of novel training techniques aimed at significantly enhancing the performance of rectified flows, enabling them to perform competitively with state-of-the-art distillation methods such as Consistency Models (CD) and Progressive Distillation (PD) even in low NFE regimes.

Key Findings and Innovations

One-Round Reflow Sufficiency

A central claim presented in this work is that a single iteration of the Reflow algorithm is sufficient to learn nearly straight trajectories. Previous methods employed multiple iterations of Reflow, which increased computational burden and often resulted in error accumulation. By improving the training process, this paper demonstrates that a single Reflow iteration can achieve similar or superior quality in terms of the generated samples.

Enhanced Training Techniques

The following improvements were proposed for training rectified flows:

U-shaped Timestep Distribution: This distribution focuses more training effort on the challenging timesteps by adopting a non-uniform (specifically, U-shaped) sampling strategy. Empirical evaluation on datasets such as CIFAR-10 shows significant improvements, with a 28% reduction in FID compared to uniform timestep distribution.
LPIPS-Huber Loss Function: Replacing the traditional squared $\ell_2$ distance, the LPIPS-Huber loss integrates perceptual similarity (LPIPS) with Huber robustness. This improves the perceptual quality of generated images and reduces FID by up to 50% on certain datasets.

Theoretical and Practical Implications

These improvements shift the paradigm of training rectified flows, allowing them to provide high-quality samples with fewer computational resources. The primary implication is a newfound ability to compete with distilled diffusion models and other state-of-the-art methods in both one-step and two-step settings. The enhancements also hold potential for applications beyond image generation, including image editing tasks and watermarking, where inversion capabilities of rectified flows are beneficial.

Empirical Evaluation

In rigorous experiments on CIFAR-10 and ImageNet 64×64 datasets, the enhanced rectified flows demonstrated:

A reduction of up to 72% in FID.
Comparable or superior performance to distillation-based methods, where 2-rectified flow achieved an FID of 3.38 in a single step, outperforming existing methods like consistency distillation (CD) and progressive distillation (PD).

In addition to quantitative improvements, qualitative advantages were observed in applications requiring few-step inversion and image-to-image translation tasks, showcasing the capability for realistic noise inversions with minimal NFE.

Future Directions in AI Research

This work opens avenues for further refinement of generative ODE-based models. The empirical results suggest that leveraging advanced solvers or integrating learning-based solvers could further optimize the quality-velocity trade-off in sampling. These developments provide a foundation for more computationally efficient and perceptually robust image generation techniques. Future research may also explore the integration of these improved techniques in broader contexts and other generative frameworks, paving the way for more practical and versatile AI models.

Conclusion

This paper represents a significant step towards making rectified flows a viable alternative to current distillation-based methods in the low NFE regime. By introducing improved training techniques, including specialized timestep distributions and enhanced objective functions, this research demonstrates the potential for rectified flows to achieve state-of-the-art performance in image generation tasks. These contributions enhance our understanding of how to efficiently train generative models and open new possibilities for their applications in AI.

While challenges remain in streamlining training processes and achieving parity with the best consistency models, the insights and innovations presented provide a strong foundation for future advancements. The ability to generate high-quality samples with fewer computational resources not only advances the technical landscape of AI but also holds promise for more accessible and efficient deployment of generative models across varied applications.

PDF Markdown

Related Papers

Tweets

https://twitter.com/sang_yun_lee/status/1846314863611859030

https://twitter.com/iScienceLuvr/status/1796374386460749994

https://twitter.com/fly51fly/status/1796664435886768597

https://twitter.com/sang_yun_lee/status/1893792422921929209

https://twitter.com/arxivsanitybot/status/1796726321319461282