AGIQA-3K: An Open Database for AI-Generated Image Quality Assessment (2306.04717v2)
Abstract: With the rapid advancements of the text-to-image generative model, AI-generated images (AGIs) have been widely applied to entertainment, education, social media, etc. However, considering the large quality variance among different AGIs, there is an urgent need for quality models that are consistent with human subjective ratings. To address this issue, we extensively consider various popular AGI models, generated AGI through different prompts and model parameters, and collected subjective scores at the perceptual quality and text-to-image alignment, thus building the most comprehensive AGI subjective quality database AGIQA-3K so far. Furthermore, we conduct a benchmark experiment on this database to evaluate the consistency between the current Image Quality Assessment (IQA) model and human perception, while proposing StairReward that significantly improves the assessment performance of subjective text-to-image alignment. We believe that the fine-grained subjective scores in AGIQA-3K will inspire subsequent AGI quality models to fit human subjective perception mechanisms at both perception and alignment levels and to optimize the generation result of future AGI models. The database is released on https://github.com/lcysyzxdxc/AGIQA-3k-Database.
- “Adversarial text-to-image synthesis: A review,” Neural Networks, vol. 144, pp. 187–209, 2021.
- “Text-to-image diffusion model in generative ai: A survey,” arXiv preprint arXiv:2303.07909, 2023.
- “Rectified wasserstein generative adversarial networks for perceptual image restoration,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 3, pp. 3648–3663, 2023.
- “Generative adversarial text to image synthesis,” in International conference on machine learning. PMLR, 2016, pp. 1060–1069.
- “Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 5907–5915.
- “Attngan: Fine-grained text to image generation with attentional generative adversarial networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 1316–1324.
- “Cogview: Mastering text-to-image generation via transformers,” Advances in Neural Information Processing Systems, vol. 34, pp. 19822–19835, 2021.
- “Zero-shot text-to-image generation,” in International Conference on Machine Learning. PMLR, 2021, pp. 8821–8831.
- “Scaling autoregressive models for content-rich text-to-image generation,” arXiv preprint arXiv:2206.10789, 2022.
- “Glide: Towards photorealistic image generation and editing with text-guided diffusion models,” in International Conference on Machine Learning. PMLR, 2022, pp. 16784–16804.
- “High-resolution image synthesis with latent diffusion models,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 10684–10695.
- “Text-guided synthesis of artistic images with retrieval-augmented diffusion models,” arXiv preprint arXiv:2207.13038, 2022.
- “A perceptual quality assessment exploration for aigc images,” arXiv preprint arXiv:2303.12618, 2023.
- “Imagereward: Learning and evaluating human preferences for text-to-image generation,” arXiv preprint arXiv:2304.05977, 2023.
- “Diffusiondb: A large-scale prompt gallery dataset for text-to-image generative models,” arXiv preprint arXiv:2210.14896, 2022.
- “Pick-a-pic: An open dataset of user preferences for text-to-image generation,” arXiv preprint arXiv:2305.01569, 2023.
- “Better aligning text-to-image models with human preference,” arXiv preprint arXiv:2303.14420, 2023.
- “Improved techniques for training gans,” Advances in neural information processing systems, vol. 29, 2016.
- “Gans trained by a two time-scale update rule converge to a local nash equilibrium,” Advances in neural information processing systems, vol. 30, 2017.
- “Demystifying mmd gans,” arXiv preprint arXiv:1801.01401, 2018.
- “Learning to evaluate the artness of ai-generated images,” arXiv preprint arXiv:2305.04923, 2023.
- “Perceptual image quality assessment: a survey,” Science China Information Sciences, vol. 63, pp. 1–52, 2020.
- “A real-time blind quality-of-experience assessment metric for http adaptive streaming,” arXiv preprint arXiv:2303.09818, 2023.
- “Light-vqa: A multi-dimensional quality assessment model for low-light video enhancement,” arXiv preprint arXiv:2305.09512, 2023.
- “Vdpve: Vqa dataset for perceptual video enhancement,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 1474–1483.
- “Blind quality assessment for in-the-wild images via hierarchical feature fusion and iterative mixed database training,” IEEE Journal of Selected Topics in Signal Processing, 2023.
- “Screen content quality assessment: overview, benchmark, and beyond,” ACM Computing Surveys (CSUR), vol. 54, no. 9, pp. 1–36, 2021.
- “Joint chroma downsampling and upsampling for screen content image,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 26, no. 9, pp. 1595–1609, 2016.
- “A full-reference quality assessment metric for cartoon images,” in 2022 IEEE 24th International Workshop on Multimedia Signal Processing (MMSP). IEEE, 2022, pp. 1–6.
- “Clipscore: A reference-free evaluation metric for image captioning,” arXiv preprint arXiv:2104.08718, 2021.
- Yixiong Chen, “X-iqe: explainable image quality evaluation for text-to-image generation with visual large language models,” arXiv preprint arXiv:2305.10843, 2023.
- “Clip-vip: Adapting pre-trained image-text model to video-language alignment,” in The Eleventh International Conference on Learning Representations, 2023.
- “Hierarchical text-conditional image generation with clip latents,” arXiv preprint arXiv:2204.06125, 2022.
- David Holz, “Midjourney,” https://www.midjourney.com/, 2023.
- “Koniq-10k: An ecologically valid database for deep learning of blind image quality assessment,” IEEE Transactions on Image Processing, vol. 29, pp. 4041–4056, 2020.
- Ali Borji, “Generated faces in the wild: Quantitative comparison of stable diffusion, midjourney and dall-e 2,” arXiv preprint arXiv:2210.00586, 2022.
- “The konstanz natural video database (konvid-1k),” in 2017 Ninth international conference on quality of multimedia experience (QoMEX). IEEE, 2017, pp. 1–6.
- “The creation and detection of deepfakes: A survey,” ACM Computing Surveys (CSUR), vol. 54, no. 1, pp. 1–41, 2021.
- I. T. Union, “Methodology for the subjective assessment of the quality of television pictures,” ITU-R Recommendation BT. 500-11, 2002.
- “Hierarchical feature aggregation based on transformer for image-text matching,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 9, pp. 6437–6447, 2022.
- “Discrete joint semantic alignment hashing for cross-modal image-text search,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 11, pp. 8022–8036, 2022.
- “Assessing visual quality of omnidirectional videos,” IEEE transactions on circuits and systems for video technology, vol. 29, no. 12, pp. 3516–3530, 2018.
- “Hybrid no-reference quality metric for singly and multiply distorted images,” IEEE Transactions on Broadcasting, vol. 60, no. 3, pp. 555–567, 2014.
- “Nima: Neural image assessment,” IEEE transactions on image processing, vol. 27, no. 8, pp. 3998–4011, 2018.
- “Image quality score distribution prediction via alpha stable model,” IEEE Transactions on Circuits and Systems for Video Technology, 2022.
- “D2former: Jointly learning hierarchical detectors and contextual descriptors via agent-based transformers,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 2904–2914.
- “Prompt-based learning for unpaired image captioning,” IEEE Transactions on Multimedia, 2023.
- “Visual-textual joint relevance learning for tag-based social image search,” IEEE Transactions on Image Processing, vol. 22, no. 1, pp. 363–376, 2012.
- “Models of word segmentation in fluent maternal speech to infants,” in Signal to syntax, pp. 129–146. Psychology Press, 2014.
- “A statistical evaluation of recent full reference image quality assessment algorithms,” IEEE Transactions on image processing, vol. 15, no. 11, pp. 3440–3451, 2006.
- “No-reference quality assessment of contrast-distorted images using contrast enhancement,” arXiv preprint arXiv:1904.08879, 2019.
- “A no-reference perceptual image sharpness metric based on a cumulative probability of blur detection,” in 2009 International Workshop on Quality of Multimedia Experience. IEEE, 2009, pp. 87–91.
- “Making a “completely blind” image quality analyzer,” IEEE Signal processing letters, vol. 20, no. 3, pp. 209–212, 2012.
- “Microsoft coco: Common objects in context,” in Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13. Springer, 2014, pp. 740–755.
- “Blind image quality estimation via distortion aggravation,” IEEE Transactions on Broadcasting, vol. 64, no. 2, pp. 508–517, 2018.
- “Blind image quality assessment using joint statistics of gradient magnitude and laplacian features,” IEEE Transactions on Image Processing, vol. 23, no. 11, pp. 4850–4862, 2014.
- “Large-scale crowdsourced study for high dynamic range images,” IEEE Transactions on Image Processing, vol. 26, no. 10, pp. 4725–4740, 2017.
- “Blind image quality assessment using a deep bilinear convolutional neural network,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 30, no. 1, pp. 36–47, 2018.
- “Exploring clip for assessing the look and feel of images,” arXiv preprint arXiv:2207.12396, 2022.
- “Convolutional neural networks for no-reference image quality assessment,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 1733–1740.
- “Blindly assess image quality in the wild guided by a self-adaptive hyper network,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 3667–3676.
- “Libsvm: a library for support vector machines,” ACM transactions on intelligent systems and technology (TIST), vol. 2, no. 3, pp. 1–27, 2011.
- “IQA-PyTorch: Pytorch toolbox for image quality assessment,” [Online]. Available: https://github.com/chaofengc/IQA-PyTorch, 2022.
- “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.