Synthetic Data from Diffusion Models Improves ImageNet Classification (2304.08466v1)

Published 17 Apr 2023 in cs.CV, cs.AI, cs.CL, and cs.LG

Abstract: Deep generative models are becoming increasingly powerful, now generating diverse high fidelity photo-realistic samples given text prompts. Have they reached the point where models of natural images can be used for generative data augmentation, helping to improve challenging discriminative tasks? We show that large-scale text-to image diffusion models can be fine-tuned to produce class conditional models with SOTA FID (1.76 at 256x256 resolution) and Inception Score (239 at 256x256). The model also yields a new SOTA in Classification Accuracy Scores (64.96 for 256x256 generative samples, improving to 69.24 for 1024x1024 samples). Augmenting the ImageNet training set with samples from the resulting models yields significant improvements in ImageNet classification accuracy over strong ResNet and Vision Transformer baselines.

Citations (238)

View on Semantic Scholar

Summary

The paper demonstrates that fine-tuning the Imagen diffusion model on ImageNet data significantly improves synthetic data quality for classification tasks.
It achieves state-of-the-art metrics with a FID of 1.76 and an Inception Score of 239 at 256×256 resolution.
The study reports classification accuracy improvements from 64.96% to 69.24% with higher resolution samples, underscoring synthetic data's potential for complex tasks.

Synthetic Data from Diffusion Models Improves ImageNet Classification

The paper presents a significant contribution to the field of generative data augmentation by exploring the use of large-scale text-to-image diffusion models, specifically tailored for ImageNet classification tasks. By fine-tuning an existing diffusion-based generative model, Imagen, on ImageNet data, the paper demonstrates that synthetic data augmentation can lead to improvements in challenging discriminative tasks.

Generative Model Training and Metrics

The research utilizes a large-scale text-to-image model, Imagen, originally trained on a diverse dataset, and fine-tunes it for class-conditional generation on ImageNet. This fine-tuning process achieves state-of-the-art (SOTA) performance regarding Fréchet Inception Distance (FID) and Inception Score (IS) at resolutions of 256×256, achieving FID of 1.76 and IS of 239. The results are impressive, demonstrating that the adapted Imagen model surpasses other generative models cited in the literature. Notably, this performance is achieved without architectural modifications, indicating the strength of pre-training on a large dataset and subsequent domain-specific fine-tuning.

Improvements in Classification Tasks

Crucially, the model sets a new SOTA in Classification Accuracy Scores (CAS) for models trained on synthetic data. For 256×256 generated samples, the model reaches a CAS of 64.96%, and this improves to 69.24% with 1024×1024 samples. These results are compelling as they bring the accuracy of models trained on generated data closer to those trained on real data, thus addressing a key shortcoming of synthetic data in previous work.

Implications and Future Perspectives

The implications of these findings are profound for both theoretical and practical aspects of AI. From a theoretical standpoint, the results encourage further exploration into scaling generative models and utilizing large-scale pre-training before domain-specific adaptation. Practically, the results suggest that synthetic data can be an effective tool for complex tasks like ImageNet classification, historically a domain requiring real data with extensive and precise annotation.

In future work, it may be worth examining the underlying mechanisms by which high-resolution samples, when downsampled, improve classification accuracy, as observed in this paper. Additionally, understanding the limitations when a large volume of synthetic data is mixed with real data could further optimize the augmentation process. Given these promising results, advancing techniques that balance quality and diversity in generated data will undoubtedly play an essential role in AI development, particularly in scenarios with limited access to labeled datasets.

Overall, the paper provides a detailed analysis that validates the potential of diffusion-based models in generative data augmentation, accentuating their role in enhancing the robustness and effectiveness of deep learning classification models.

PDF Markdown

Related Papers

Tweets

https://twitter.com/gbgcollectr/status/1906265353736311047

YouTube

Show All Videos