Linear Combination of Saved Checkpoints Makes Consistency and Diffusion Models Better (2404.02241v3)

Published 2 Apr 2024 in cs.CV

Abstract: Diffusion Models (DM) and Consistency Models (CM) are two types of popular generative models with good generation quality on various tasks. When training DM and CM, intermediate weight checkpoints are not fully utilized and only the last converged checkpoint is used. In this work, we find that high-quality model weights often lie in a basin which cannot be reached by SGD but can be obtained by proper checkpoint averaging. Based on these observations, we propose LCSC, a simple but effective and efficient method to enhance the performance of DM and CM, by combining checkpoints along the training trajectory with coefficients deduced from evolutionary search. We demonstrate the value of LCSC through two use cases: $\textbf{(a) Reducing training cost.}$ With LCSC, we only need to train DM/CM with fewer number of iterations and/or lower batch sizes to obtain comparable sample quality with the fully trained model. For example, LCSC achieves considerable training speedups for CM (23$\times$ on CIFAR-10 and 15$\times$ on ImageNet-64). $\textbf{(b) Enhancing pre-trained models.}$ Assuming full training is already done, LCSC can further improve the generation quality or speed of the final converged models. For example, LCSC achieves better performance using 1 number of function evaluation (NFE) than the base model with 2 NFE on consistency distillation, and decreases the NFE of DM from 15 to 9 while maintaining the generation quality on CIFAR-10. Our code is available at https://github.com/imagination-research/LCSC.

Citations (3)

View on Semantic Scholar

Summary

The paper introduces LCSC, a novel method that linearly combines intermediate checkpoints to enhance diffusion and consistency model performance.
It achieves over 14-fold acceleration in training and improved FID scores by optimizing combination coefficients in a low-dimensional search space.
The study reveals unexplored weight space structures, offering fresh insights for efficient generative model training and optimization.

Enhancing Generative Models through Linear Combination of Checkpoints: A Study on Consistency and Diffusion Models

Introduction

Generative modeling has witnessed significant advancements with the advent of Diffusion Models (DMs) and Consistency Models (CMs), both demonstrating compelling performance across a variety of tasks. A common practice in training these models is the utilization of the last converged weight checkpoint for generation tasks. However, this approach overlooks the wealth of valuable information embedded in intermediate checkpoints. This paper investigates a novel methodology, termed as Linear Combination of Saved Checkpoints (LCSC), aimed at exploiting these intermediate checkpoints to either expediently reach or surpass the generative quality of fully trained models.

Observations and Motivations

The investigation into the training dynamics of DMs and CMs reveals that the trajectory traversed in the weight space contains numerous potential checkpoints that, if appropriately combined, could lead to superior model performance unreachable by traditional optimization routes like Stochastic Gradient Descent (SGD) and its variants. Additionally, despite the prevalent application of Exponential Moving Average (EMA) for stabilizing training, our findings suggest its sub-optimality, thereby presenting an opportunity for improvement.

LCSC: Methodology

LCSC proposes an optimization framework that operates in a low-dimensional search space, aiming at optimizing a small number of combination coefficients of selected checkpoints. This approach is tailored to enhance the generative quality of models as measured by established metrics such as Frechet Inception Distance (FID). By employing evolutionary search to determine these coefficients, LCSC circumvents the limitations of gradient-based methods, particularly for objectives that are non-differentiable or computationally intensive. This method proves effective in both reducing the computational demands of training robust models and enhancing the performance of fully trained models.

Experimental Validations

A comprehensive set of experiments across two primary use cases—namely, reducing training costs and enhancing pre-trained models—demonstrates LCSC's efficacy. Notably, for consistency models trained with CD on CIFAR-10, LCSC achieves an FID score that significantly surpasses the base models with considerably fewer training iterations, exemplifying an over 14-fold acceleration in training speed. Furthermore, when applied to pre-trained DMs and CMs, LCSC consistently improves sample quality or speeds up the generation process, showcasing its potential in refining the output capabilities of these models.

Theoretical Implications and Future Directions

The results obtained from LCSC suggest that the weight space of DMs and CMs contains rich structures and basins of optimal performance that are not readily accessible through conventional training methods. The ability of LCSC to locate these basins by leveraging the linear combinations of checkpoints opens new avenues for understanding and exploiting the training dynamics of generative models. Future work may explore the extension of LCSC's approach to other forms of generative models and neural networks, further advancing our capabilities in efficient and effective model training and optimization.

Conclusion

This paper introduces a promising technique, LCSC, which by harnessing the power of intermediate weight checkpoints, can significantly enhance the performance of generative models, notably DMs and CMs. The method offers a novel perspective on optimizing generative model performance, providing both practical benefits in terms of computational efficiency and theoretical insights into the landscape of model weights. Its utility in both accelerating model training and enhancing pre-trained models, as demonstrated through rigorous experimentation, heralds LCSC as a valuable contribution to the field of generative modeling.

Related Papers

Tweets

https://twitter.com/serrjoa/status/1775797422372921834

https://twitter.com/CSVisionPapers/status/1775792045270962541