Generative Artificial Intelligence: A Systematic Review and Applications (2405.11029v1)
Abstract: In recent years, the study of AI has undergone a paradigm shift. This has been propelled by the groundbreaking capabilities of generative models both in supervised and unsupervised learning scenarios. Generative AI has shown state-of-the-art performance in solving perplexing real-world conundrums in fields such as image translation, medical diagnostics, textual imagery fusion, natural language processing, and beyond. This paper documents the systematic review and analysis of recent advancements and techniques in Generative AI with a detailed discussion of their applications including application-specific models. Indeed, the major impact that generative AI has made to date, has been in language generation with the development of LLMs, in the field of image translation and several other interdisciplinary applications of generative AI. Moreover, the primary contribution of this paper lies in its coherent synthesis of the latest advancements in these areas, seamlessly weaving together contemporary breakthroughs in the field. Particularly, how it shares an exploration of the future trajectory for generative AI. In conclusion, the paper ends with a discussion of Responsible AI principles, and the necessary ethical considerations for the sustainability and growth of these generative models.
- Brain tumor classification using a combination of variational autoencoders and generative adversarial networks. Biomedicines, 10(2):223.
- Mega: Multilingual evaluation of generative ai.
- Contextual string embeddings for sequence labeling. In Proceedings of the 27th international conference on computational linguistics, pages 1638–1649.
- A hierarchical structured self-attentive model for extractive document summarization (hssas). IEEE Access, 6:24205–24212.
- The role of generative adversarial networks in brain mri: a scoping review. Insights into imaging, 13(1):98.
- Wasserstein gan.
- Vivit: A video vision transformer. In Proceedings of the IEEE/CVF international conference on computer vision, pages 6836–6846.
- Explainable artificial intelligence (xai): Concepts, taxonomies, opportunities and challenges toward responsible ai. Information fusion, 58:82–115.
- Real-time monocular depth estimation using synthetic data with domain adaptation via image style transfer. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
- TuckER: Tensor factorization for knowledge graph completion. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics.
- Hp-gan: Probabilistic 3d human motion prediction via gan. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops.
- Is space-time attention all you need for video understanding? In ICML, volume 2, page 4.
- Demystifying mmd gans. arXiv preprint arXiv:1801.01401.
- Bozkurt, A. (2023). Generative artificial intelligence (ai) powered conversational educational agents: The inevitable paradigm shift. Asian Journal of Distance Education, 18(1).
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
- The cora dataset: validation and diagnostics of ocean temperature and salinity in situ measurements. Ocean Science Discussions, 9(2):1273–1312.
- Anomaly detection for alzheimer’s disease in brain mris via unsupervised generative adversarial learning. In 2022 International Conference on Artificial Intelligence in Information and Communication (ICAIIC), pages 1–5.
- Kbgan: Adversarial learning for knowledge graph embeddings.
- A comprehensive survey of ai-generated content (aigc): A history of generative ai from gan to chatgpt.
- A comparison of word2vec, hmm2vec, and pca2vec for malware classification.
- Infogan: Interpretable representation learning by information maximizing generative adversarial nets.
- Upgpt: Universal diffusion model for person image generation, editing and pose transfer.
- Semi-supervised sequence modeling with cross-view training. arXiv preprint arXiv:1809.08370.
- Xnli: Evaluating cross-lingual sentence representations. arXiv preprint arXiv:1809.05053.
- Support-vector networks. Machine learning, 20:273–297.
- Courant, R. (1943). Variational methods for the solution of problems of equilibrium and vibrations.
- Generative adversarial networks: An overview. IEEE Signal Processing Magazine, 35(1):53–65.
- Image synthesis in multi-contrast mri with conditional generative adversarial networks.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
- Enhancing cancer differentiation with synthetic mri examinations via generative models: a systematic review. Insights into Imaging, 13(1):188.
- Nice: Non-linear independent components estimation.
- Density estimation using real nvp.
- Adversarial feature learning. arXiv preprint arXiv:1605.09782.
- Opinion paper: “so what if chatgpt wrote it?” multidisciplinary perspectives on opportunities, challenges and implications of generative conversational ai for research, practice and policy. International Journal of Information Management, 71:102642.
- Christoph feichtenhofer. multiscale vision transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 6824–6835.
- Feichtenhofer, C. (2020). X3d: Expanding architectures for efficient video recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 203–213.
- Slowfast networks for video recognition. In Proceedings of the IEEE/CVF international conference on computer vision, pages 6202–6211.
- Trafficgen: Learning to generate diverse and realistic traffic scenarios. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pages 3567–3575.
- Automatic generation of semantic parts for face image synthesis.
- Synthetic data augmentation using gan for improved liver lesion classification.
- Higan+: Handwriting imitation gan with disentangled representations. ACM Trans. Graph., 42(1).
- Fem simulation-based generative adversarial networks to detect bearing faults. IEEE Transactions on Industrial Informatics, 16(7):4961–4971.
- SimGANs: Simulator-based generative adversarial networks for ECG synthesis to improve deep ECG classification. In III, H. D. and Singh, A., editors, Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pages 3597–3606. PMLR.
- Generative adversarial networks.
- Ffjord: Free-form continuous dynamics for scalable reversible generative models. arXiv preprint arXiv:1810.01367.
- Gans trained by a two time-scale update rule converge to a local nash equilibrium. In Guyon, I., Luxburg, U. V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., and Garnett, R., editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc.
- Lstm can solve hard long time lag problems. Advances in neural information processing systems, 9.
- Dagan++: Depth-aware generative adversarial network for talking head video generation.
- Depth-aware generative adversarial network for talking head video generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 3397–3406.
- Unsupervised image-to-image translation: A review. Sensors, 22(21).
- Extreme learning machine: a new learning scheme of feedforward neural networks. In 2004 IEEE international joint conference on neural networks (IEEE Cat. No. 04CH37541), volume 2, pages 985–990. Ieee.
- Image-to-image translation with conditional adversarial networks.
- Age-specific diagnostic classification of asd using deep learning approaches. Studies in Health Technology and Informatics, 309:267–271.
- Mimic-iii, a freely accessible critical care database. Scientific data, 3(1):1–9.
- Extending a parser to distant domains using a few dozen partially annotated examples.
- Highly accurate protein structure prediction with alphafold. Nature, 596(7873):583–589.
- Malware classification with word2vec, hmm2vec, bert, and elmo. Journal of Computer Virology and Hacking Techniques, 19(1):1–16.
- Scaling up gans for text-to-image synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10124–10134.
- The kinetics human action video dataset.
- Attentional networks for music generation. Multimedia Tools and Applications, 81(4):5179–5189.
- Ctrl: A conditional transformer language model for controllable generation.
- Dcavn: Cervical cancer prediction and classification using deep convolutional and variational autoencoder network. Multimedia Tools and Applications, 80:30399–30415.
- Glow: Generative flow with invertible 1x1 convolutions. Advances in neural information processing systems, 31.
- Improved variational inference with inverse autoregressive flow. In Lee, D., Sugiyama, M., Luxburg, U., Guyon, I., and Garnett, R., editors, Advances in Neural Information Processing Systems, volume 29. Curran Associates, Inc.
- Auto-encoding variational bayes.
- Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114.
- Semi-supervised classification with graph convolutional networks.
- A novel diffusivity function-based image denoising for mri medical images. Multimedia Tools and Applications, 82(21):32057–32089.
- Movinets: Mobile video networks for efficient video recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16020–16030.
- Textcontrolgan: Text-to-image synthesis with controllable generative adversarial networks. Applied Sciences, 13(8):5098.
- A comprehensive survey on generative adversarial networks used for synthesizing multimedia content. Multimedia Tools and Applications, 82(26):40585–40624.
- Community detection in complex networks using stacked autoencoders and crow search algorithm. The Journal of Supercomputing, 79(3):3329–3356.
- Classification of autism spectrum disorder based on brain image data using deep neural networks. In International Conference on Frontiers of Intelligent Computing: Theory and Applications, pages 209–218. Springer.
- Dbpedia–a large-scale, multilingual knowledge base extracted from wikipedia. Semantic web, 6(2):167–195.
- Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension.
- Mvitv2: Improved multiscale vision transformers for classification and detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 4804–4814.
- Let’s verify step by step.
- Lin, C.-Y. (2004). ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out, pages 74–81, Barcelona, Spain. Association for Computational Linguistics.
- Attention-based spatial guidance for image-to-image translation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 816–825.
- Psgan: A generative adversarial network for remote sensing image pan-sharpening. IEEE Transactions on Geoscience and Remote Sensing, 59(12):10227–10242.
- K-bert: Enabling language representation with knowledge graph.
- Deep learning face attributes in the wild. In Proceedings of International Conference on Computer Vision (ICCV).
- Video swin transformer. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3202–3211.
- Stacked convolutional auto-encoders for hierarchical feature extraction. In Artificial Neural Networks and Machine Learning–ICANN 2011: 21st International Conference on Artificial Neural Networks, Espoo, Finland, June 14-17, 2011, Proceedings, Part I 21, pages 52–59. Springer.
- The columbia multi-document summarizer for duc 2002. In Workshop on Automatic Summarization, pages 1–8.
- The multimodal brain tumor image segmentation benchmark (brats). IEEE transactions on medical imaging, 34(10):1993–2024.
- Which training methods for gans do actually converge?
- The numerics of gans.
- Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.
- Styletalker: One-shot style-based audio-driven talking head video generation.
- Conditional generative adversarial nets.
- Github copilot ai pair programmer: Asset or liability? Journal of Systems and Software, 203:111734.
- Gradient descent gan optimization is locally stable. In Guyon, I., Luxburg, U. V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., and Garnett, R., editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc.
- VoxCeleb: A large-scale speaker identification dataset. In Interspeech 2017. ISCA.
- Webgpt: Browser-assisted question-answering with human feedback.
- Video transformer network.
- Odena, A. (2016). Semi-supervised learning with generative adversarial networks.
- Conditional image synthesis with auxiliary classifier gans.
- OpenAI (2023). Gpt-4 technical report.
- Librispeech: An asr corpus based on public domain audio books. In 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 5206–5210.
- Correction of banding errors in satellite images with generative adversarial networks (gan). IEEE Access.
- Deep contextualized word representations.
- A lip sync expert is all you need for speech to lip generation in the wild. In Proceedings of the 28th ACM international conference on multimedia, pages 484–492.
- From copilot to pilot: Towards ai supported software development.
- Qi, G.-J. (2018). Loss-sensitive generative adversarial networks on lipschitz densities.
- Pre-trained models for natural language processing: A survey. Science China Technological Sciences, 63(10):1872–1897.
- Quinlan, J. R. (1986). Induction of decision trees. Machine learning, 1:81–106.
- Learning transferable visual models from natural language supervision.
- Improving language understanding by generative pre-training.
- Language models are unsupervised multitask learners. OpenAI blog, 1(8):9.
- Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1):5485–5551.
- An extractive text summarization approach using tagged-lda based topic modeling. Multimedia tools and applications, 80:3275–3305.
- Dall-e: Creating images from text. UGC Care Group I Journal, 8(14):71–75.
- Reg-gan: Semi-supervised learning based on generative adversarial networks for regression. In 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP), pages 2806–2810. IEEE.
- Improved techniques for training gans. In Lee, D., Sugiyama, M., Luxburg, U., Guyon, I., and Garnett, R., editors, Advances in Neural Information Processing Systems, volume 29. Curran Associates, Inc.
- Introduction to the conll-2003 shared task: Language-independent named entity recognition. arXiv preprint cs/0306050.
- Content-based secure image retrieval in an untrusted third-party environment. In International Conference on Frontiers of Intelligent Computing: Theory and Applications, pages 287–297. Springer.
- Multi-planar 3d knee mri segmentation via unet inspired architectures. International Journal of Imaging Systems and Technology, 33(3):985–998.
- Moving object tracking using laplacian-dct based perceptual hash. In 2016 International Conference on Wireless Communications, Signal Processing and Networking (WiSPNET), pages 2345–2349. IEEE.
- Motion segmentation-based surveillance video compression using adaptive particle swarm optimization. Neural Computing and Applications, 32(15):11443–11457.
- Visualizing and understanding graph convolutional network. Multimedia Tools and Applications, 80:8355–8375.
- Singhal, A. (2012). Introducing the knowledge graph: Things, not strings,.
- Ucf101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402.
- Adding realtime coverage to the google knowledge graph. In 11th International Semantic Web Conference (ISWC 2012), volume 914, pages 65–68. Citeseer.
- Sequence to sequence learning with neural networks. In Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N., and Weinberger, K., editors, Advances in Neural Information Processing Systems, volume 27. Curran Associates, Inc.
- A diverse domain generative adversarial network for style transfer on face photographs.
- Scenegen: Learning to generate realistic traffic scenes. In Proceedings - 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2021, Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 892–901. IEEE Computer Society. Funding Information: Work done at Uber ATG. Publisher Copyright: © 2021 IEEE; 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2021 ; Conference date: 19-06-2021 Through 25-06-2021.
- Tanchenko, A. (2014). Visual-psnr measure of image quality. Journal of Visual Communication and Image Representation, 25(5):874–878.
- Fastmri prostate: A publicly available, biparametric mri dataset to advance machine learning for prostate cancer imaging.
- What if the devil is my guardian angel: Chatgpt as a case study of using chatbots in education. Smart Learning Environments, 10(1):15.
- Uvcgan: Unet vision transformer cycle-consistent gan for unpaired image-to-image translation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 702–712.
- Wavenet: A generative model for raw audio.
- Multi-head-self-attention based yolov5x-transformer for multi-scale object detection. Multimedia Tools and Applications, pages 1–27.
- Attention is all you need. In Guyon, I., Luxburg, U. V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., and Garnett, R., editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc.
- Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. Journal of machine learning research, 11(12).
- Pgnet: Real-time arbitrarily-shaped text spotting with point gathering network. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 2782–2790.
- Learning from synthetic data for crowd counting in the wild.
- Audio2head: Audio-driven one-shot talking-head generation with natural head motion. arXiv preprint arXiv:2107.09293.
- Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2097–2106.
- Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4):600–612.
- Cfrwd-gan for sar-to-optical image translation. Remote Sensing, 15(10):2547.
- Reenactgan: Learning to reenact faces via boundary transfer.
- Hmnet: Hybrid matching network for few-shot link prediction. In International Conference on Database Systems for Advanced Applications, pages 307–322. Springer.
- Generative adversarial networks can create high quality artificial prostate cancer magnetic resonance images. Journal of Personalized Medicine, 13(3):547.
- Layoutlm: Pre-training of text and layout for document image understanding. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 1192–1200.
- Swin transformer-based gan for multi-modal medical image translation. Frontiers in Oncology, 12:942511.
- A study of face obfuscation in imagenet. In International Conference on Machine Learning (ICML).
- Exploring the limits of chatgpt for query or aspect-based text summarization.
- Semantic facial expression editing using autoencoded flow.
- Seqgan: Sequence generative adversarial nets with policy gradient.
- Deep generative molecular design reshapes drug discovery. Cell Reports Medicine.
- The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 586–595.
- Ernie: Enhanced language representation with informative entities. arXiv preprint arXiv:1905.07129.
- Flow-guided one-shot talking face generation with a high-resolution audio-visual dataset. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3661–3670.
- A comparative analysis of gan-based methods for sar-to-optical image translation. IEEE Geoscience and Remote Sensing Letters, 19:1–5.
- QMSum: A new benchmark for query-based multi-domain meeting summarization. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 5905–5921, Online. Association for Computational Linguistics.
- Makelttalk: speaker-aware talking-head animation. ACM Transactions On Graphics (TOG), 39(6):1–15.
- A hierarchical network for abstractive meeting summarization with cross-domain pretraining. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 194–203, Online. Association for Computational Linguistics.
- Unpaired image-to-image translation using cycle-consistent adversarial networks. In 2017 IEEE International Conference on Computer Vision (ICCV), pages 2242–2251.
- Toward multimodal image-to-image translation. Advances in neural information processing systems, 30.
- Toward multimodal image-to-image translation. In Guyon, I., Luxburg, U. V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., and Garnett, R., editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc.
- Style fader generative adversarial networks for style degree controllable artistic style transfer. In Proc. Int. Joint Conf. on Artif. Intell.(IJCAI), pages 5002–5009.