Generative AI in Vision: A Survey on Models, Metrics and Applications (2402.16369v1)
Abstract: Generative AI models have revolutionized various fields by enabling the creation of realistic and diverse data samples. Among these models, diffusion models have emerged as a powerful approach for generating high-quality images, text, and audio. This survey paper provides a comprehensive overview of generative AI diffusion and legacy models, focusing on their underlying techniques, applications across different domains, and their challenges. We delve into the theoretical foundations of diffusion models, including concepts such as denoising diffusion probabilistic models (DDPM) and score-based generative modeling. Furthermore, we explore the diverse applications of these models in text-to-image, image inpainting, and image super-resolution, along with others, showcasing their potential in creative tasks and data augmentation. By synthesizing existing research and highlighting critical advancements in this field, this survey aims to provide researchers and practitioners with a comprehensive understanding of generative AI diffusion and legacy models and inspire future innovations in this exciting area of artificial intelligence.
- Brian DO Anderson. Reverse-time diffusion equation models. Stochastic Processes and their Applications, 12(3):313–326, 1982.
- Towards principled methods for training generative adversarial networks, 2017.
- Wasserstein gan, 2017.
- Image inpainting. In Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques, page 417–424, USA, 2000. ACM Press/Addison-Wesley Publishing Co.
- Began: Boundary equilibrium generative adversarial networks, 2017.
- Improving image generation with better captions. Computer Science. https://cdn. openai. com/papers/dall-e-3. pdf, 2(3):8, 2023.
- Demystifying mmd gans, 2021.
- Deep generative modelling: A comparative review of vaes, gans, normalizing flows, energy-based and autoregressive models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(11):7327–7347, 2022.
- Glean: Generative latent bank for large-factor image super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 14245–14254, 2021.
- Re-imagen: Retrieval-augmented text-to-image generator, 2022.
- Activating more pixels in image super-resolution transformer, 2023a.
- Recursive generalization transformer for image super-resolution, 2023b.
- Dual aggregation transformer for image super-resolution, 2023c.
- Generative adversarial networks: An overview. IEEE Signal Processing Magazine, 35(1):53–65, 2018.
- Diffusion models in vision: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(9):10850–10869, 2023.
- Bert: Pre-training of deep bidirectional transformers for language understanding, 2019.
- Diffusion models beat gans on image synthesis, 2021.
- Cogview: Mastering text-to-image generation via transformers, 2021.
- Nice: Non-linear independent components estimation, 2015.
- Density estimation using real nvp, 2017.
- Carl Doersch. Tutorial on variational autoencoders, 2021.
- The fréchet distance between multivariate normal distributions. Journal of Multivariate Analysis, 12(3):450–455, 1982.
- Implicit generation and modeling with energy based models. In Advances in Neural Information Processing Systems. Curran Associates, Inc., 2019.
- Implicit diffusion models for continuous super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10021–10030, 2023.
- Ian Goodfellow. Nips 2016 tutorial: Generative adversarial networks, 2017.
- Generative adversarial networks, 2014.
- Alex Graves. Generating sequences with recurrent neural networks, 2014.
- Vector quantized diffusion model for text-to-image synthesis, 2022.
- Cflow-ad: Real-time unsupervised anomaly detection with localization via conditional normalizing flows. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 98–107, 2022.
- Improved training of wasserstein gans, 2017.
- Clipscore: A reference-free evaluation metric for image captioning, 2022.
- Gans trained by a two time-scale update rule converge to a local nash equilibrium, 2018.
- Denoising diffusion probabilistic models, 2020.
- Imagen video: High definition video generation with diffusion models. arXiv preprint arXiv:2210.02303, 2022.
- Estimation of non-normalized statistical models by score matching. Journal of Machine Learning Research, 6(4), 2005.
- A survey on gans for computer vision: Recent research, analysis and taxonomy. Computer Science Review, 48:100553, 2023.
- Globally and locally consistent image completion. ACM Trans. Graph., 36(4), 2017.
- Image-to-image translation with conditional adversarial networks, 2018.
- Scaling up gans for text-to-image synthesis, 2023.
- A style-based generator architecture for generative adversarial networks, 2019.
- An introduction to variational autoencoders. Foundations and Trends® in Machine Learning, 12(4):307–392, 2019.
- Auto-encoding variational bayes, 2022.
- Normalizing flows: An introduction and review of current methods. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(11):3964–3979, 2021.
- Sentencepiece: A simple and language independent subword tokenizer and detokenizer for neural text processing, 2018.
- Improved precision and recall metric for assessing generative models, 2019.
- The neural autoregressive distribution estimator. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pages 29–37, Fort Lauderdale, FL, USA, 2011. PMLR.
- A tutorial on energy-based learning. 2006.
- Photo-realistic single image super-resolution using a generative adversarial network, 2017.
- Controllable text-to-image generation, 2019.
- Srdiff: Single image super-resolution with diffusion probabilistic models. Neurocomputing, 479:47–59, 2022.
- Repaint: Inpainting using denoising diffusion probabilistic models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 11461–11471, 2022.
- Diffusion probabilistic models for 3d point cloud generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2837–2845, 2021.
- Generating images from captions with attention, 2016.
- Conditional generative adversarial nets, 2014.
- Spectral normalization for generative adversarial networks, 2018.
- Learning deep energy models. pages 1105–1112, 2011.
- Glide: Towards photorealistic image generation and editing with text-guided diffusion models, 2022a.
- Glide: Towards photorealistic image generation and editing with text-guided diffusion models, 2022b.
- f-gan: Training generative neural samplers using variational divergence minimization, 2016.
- Normalizing flows for probabilistic modeling and inference, 2021.
- Context encoders: Feature learning by inpainting. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2536–2544, 2016.
- Unsupervised representation learning with deep convolutional generative adversarial networks, 2016.
- Learning transferable visual models from natural language supervision, 2021.
- Zero-shot text-to-image generation, 2021.
- Hierarchical text-conditional image generation with clip latents, 2022a.
- Hierarchical text-conditional image generation with clip latents, 2022b.
- Generative adversarial text to image synthesis, 2016.
- Structureflow: Image inpainting via structure-aware appearance flow. In Proceedings of the IEEE/CVF international conference on computer vision, pages 181–190, 2019.
- Variational inference with normalizing flows, 2016.
- High-resolution image synthesis with latent diffusion models, 2022.
- Same same but differnet: Semi-supervised defect detection with normalizing flows, 2020.
- Image super-resolution via iterative refinement, 2021.
- Palette: Image-to-image diffusion models, 2022a.
- Photorealistic text-to-image diffusion models with deep language understanding, 2022b.
- Assessing generative models via precision and recall, 2018.
- Improved techniques for training gans, 2016.
- Pixelcnn++: Improving the pixelcnn with discretized logistic mixture likelihood and other modifications, 2017.
- Stylegan-t: Unlocking the power of gans for fast large-scale text-to-image synthesis, 2023.
- Unsupervised anomaly detection with generative adversarial networks to guide marker discovery, 2017.
- f-anogan: Fast unsupervised anomaly detection with generative adversarial networks. Medical Image Analysis, 54:30–44, 2019.
- Perceptual extreme super resolution network with receptive field block, 2020.
- Deep unsupervised learning using nonequilibrium thermodynamics, 2015.
- Generative modeling by estimating gradients of the data distribution, 2020a.
- Improved techniques for training score-based generative models, 2020b.
- How to train your energy-based models, 2021.
- Sliced score matching: A scalable approach to density and score estimation, 2019.
- Score-based generative modeling through stochastic differential equations, 2021.
- Going deeper with convolutions, 2014.
- Plug-and-play diffusion features for text-driven image-to-image translation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1921–1930, 2023.
- Neural autoregressive distribution estimation, 2016.
- Wavenet: A generative model for raw audio, 2016a.
- Pixel recurrent neural networks, 2016b.
- Conditional image generation with pixelcnn decoders, 2016c.
- Neural discrete representation learning, 2018.
- Attention is all you need, 2023.
- Esrgan: Enhanced super-resolution generative adversarial networks, 2018.
- Bayesian learning via stochastic gradient langevin dynamics. In Proceedings of the 28th international conference on machine learning (ICML-11), pages 681–688, 2011.
- Anoddpm: Anomaly detection with denoising diffusion probabilistic models using simplex noise. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 649–655, 2022.
- Ediffsr: An efficient diffusion probabilistic model for remote sensing image super-resolution. IEEE Transactions on Geoscience and Remote Sensing, 62:1–14, 2024.
- Diffusion models: A comprehensive survey of methods and applications, 2023.
- Generative adversarial network in medical imaging: A review. Medical Image Analysis, 58:101552, 2019.
- Generative image inpainting with contextual attention. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
- Fastflow: Unsupervised anomaly detection and localization via 2d normalizing flows, 2021.
- Vector-quantized image modeling with improved vqgan, 2022a.
- Scaling autoregressive models for content-rich text-to-image generation, 2022b.
- Scaling autoregressive models for content-rich text-to-image generation, 2022c.
- Deep structured energy based models for anomaly detection. In International conference on machine learning, pages 1100–1109. PMLR, 2016.
- Text-to-image diffusion models in generative ai: A survey, 2023a.
- Swinfir: Revisiting the swinir with fast fourier convolution and improved training for image super-resolution, 2023b.
- Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks, 2017.
- Self-attention generative adversarial networks, 2019.
- Transcending the limit of local window: Advanced super-resolution transformer with adaptive token dictionary, 2024.
- Energy-based generative adversarial network, 2017.
- Large scale image completion via co-modulated generative adversarial networks, 2021.