Enhancing Accuracy in Generative Models via Knowledge Transfer (2405.16837v3)
Abstract: This paper investigates the accuracy of generative models and the impact of knowledge transfer on their generation precision. Specifically, we examine a generative model for a target task, fine-tuned using a pre-trained model from a source task. Building on the "Shared Embedding" concept, which bridges the source and target tasks, we introduce a novel framework for transfer learning under distribution metrics such as the Kullback-Leibler divergence. This framework underscores the importance of leveraging inherent similarities between diverse tasks despite their distinct data distributions. Our theory suggests that the shared structures can augment the generation accuracy for a target task, reliant on the capability of a source model to identify shared structures and effective knowledge transfer from source to target learning. To demonstrate the practical utility of this framework, we explore the theoretical implications for two specific generative models: diffusion and normalizing flows. The results show enhanced performance in both models over their non-transfer counterparts, indicating advancements for diffusion models and providing fresh insights into normalizing flows in transfer and non-transfer settings. These results highlight the significant contribution of knowledge transfer in boosting the generation capabilities of these models.
- B. D. Anderson. Reverse-time diffusion equation models. Stochastic Processes and their Applications, 12(3):313–326, 1982.
- Conditional image generation with score-based diffusion models. arXiv preprint arXiv:2111.13606, 2021.
- J. Baxter. A model of inductive bias learning. Journal of artificial intelligence research, 12:149–198, 2000.
- Rates of convergence for density estimation with gans. arXiv preprint arXiv:2102.00199, 2021.
- Simultaneous approximation of a smooth function and its derivatives by deep neural networks with piecewise-polynomial activations. Neural Networks, 161:242–253, 2023.
- Improved analysis of score-based generative modeling: User-friendly bounds under minimal smoothness assumptions. In International Conference on Machine Learning, pages 4735–4763. PMLR, 2023.
- Score approximation, estimation and distribution recovery of diffusion models on low-dimensional data. In International Conference on Machine Learning, pages 4672–4712. PMLR, 2023.
- Sampling is as easy as learning the score: theory for diffusion models with minimal data assumptions. In International Conference on Learning Representations, 2023.
- Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, volume 1, pages 4171–4186, 2019.
- P. Dhariwal and A. Nichol. Diffusion models beat gans on image synthesis. Advances in Neural Information Processing Systems, 34:8780–8794, 2021.
- Nice: Non-linear independent components estimation. arXiv preprint arXiv:1410.8516, 2014.
- Density estimation using real nvp. arXiv preprint arXiv:1605.08803, 2016.
- Y. Frégier and J.-B. Gouray. Mind2mind: transfer learning for gans. In Geometric Science of Information: 5th International Conference, GSI 2021, Paris, France, July 21–23, 2021, Proceedings 5, pages 851–859. Springer, 2021.
- Unveil conditional diffusion models with classifier-free guidance: A sharp statistical theory. arXiv preprint arXiv:2403.11968, 2024.
- Augmented normalizing flows: Bridging the gap between generative flows and latent variable models. arXiv preprint arXiv:2002.07101, 2020.
- An error analysis of generative adversarial networks for learning distributions. Journal of machine learning research, 23(116):1–43, 2022.
- I. C. Ipsen and R. Rehman. Perturbation bounds for determinants and characteristic polynomials. SIAM Journal on Matrix Analysis and Applications, 30(2):762–776, 2008.
- Universal approximation property of invertible neural networks. Journal of Machine Learning Research, 24(287):1–68, 2023.
- On the approximation of bi-lipschitz maps by invertible neural networks. Neural Networks, page 106214, 2024.
- Representational aspects of depth and conditioning in normalizing flows. In International Conference on Machine Learning, pages 5628–5636. PMLR, 2021.
- Tabddpm: Modelling tabular data with diffusion models. In International Conference on Machine Learning, pages 17564–17579. PMLR, 2023.
- B. Li. Sufficient dimension reduction: Methods and applications with R. CRC Press, 2018.
- K.-C. Li. Sliced inverse regression for dimension reduction. Journal of the American Statistical Association, 86(414):316–327, 1991.
- Novel uncertainty quantification through perturbation-assisted sample synthesis. IEEE Transactions on Pattern Analysis and Machine Intelligence, doi.org/10.1109/TPAMI.2024.3393364, 2024.
- Deep network approximation for smooth functions. SIAM Journal on Mathematical Analysis, 53(5):5465–5506, 2021.
- Para-cflows: cksuperscript𝑐𝑘c^{k}italic_c start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT-universal diffeomorphism approximators as superior neural surrogates. Advances in Neural Information Processing Systems, 35:28829–28841, 2022.
- The benefit of multitask representation learning. Journal of Machine Learning Research, 17(81):1–32, 2016.
- Diffusion models are minimax optimal distribution estimators. In International Conference on Machine Learning, pages 26517–26582. PMLR, 2023.
- M. Ossiander. A central limit theorem under metric entropy with l2 bracketing. The Annals of Probability, pages 897–919, 1987.
- More subtle versions of the hadamard inequality. Linear Algebra and its Applications, 532:500–511, 2017.
- Boosting data analytics with synthetic volume expansion. arXiv preprint arXiv:2310.17848, 2023.
- X. Shen and W. H. Wong. Convergence rate of sieve estimates. The Annals of Statistics, pages 580–615, 1994.
- A review on bilevel optimization: From classical to evolutionary approaches and applications. IEEE transactions on evolutionary computation, 22(2):276–295, 2017.
- Deep unsupervised learning using nonequilibrium thermodynamics. In International Conference on Machine Learning, pages 2256–2265. PMLR, 2015.
- Denoising diffusion implicit models. In International Conference on Learning Representations, 2020.
- Coupling-based invertible neural networks are universal diffeomorphism approximators. Advances in Neural Information Processing Systems, 33:3362–3373, 2020.
- On the theory of transfer learning: The importance of task diversity. Advances in neural information processing systems, 33:7852–7862, 2020.
- Efficient transfer learning in diffusion models via adversarial noise. arXiv preprint arXiv:2308.11948, 2023.
- W. H. Wong and X. Shen. Probability inequalities for likelihood ratios and convergence rates of sieve mles. The Annals of Statistics, pages 339–362, 1995.