Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 134 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 24 tok/s Pro
GPT-5 High 23 tok/s Pro
GPT-4o 77 tok/s Pro
Kimi K2 159 tok/s Pro
GPT OSS 120B 431 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

Prompt Expansion for Adaptive Text-to-Image Generation (2312.16720v1)

Published 27 Dec 2023 in cs.CV

Abstract: Text-to-image generation models are powerful but difficult to use. Users craft specific prompts to get better images, though the images can be repetitive. This paper proposes a Prompt Expansion framework that helps users generate high-quality, diverse images with less effort. The Prompt Expansion model takes a text query as input and outputs a set of expanded text prompts that are optimized such that when passed to a text-to-image model, generates a wider variety of appealing images. We conduct a human evaluation study that shows that images generated through Prompt Expansion are more aesthetically pleasing and diverse than those generated by baseline methods. Overall, this paper presents a novel and effective approach to improving the text-to-image generation experience.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (27)
  1. Palm 2 technical report.
  2. Non-commitment in mental imagery. Cognition, 238:105498.
  3. Promptify: Text-to-image generation through interactive prompt exploration with large language models.
  4. PaLI: A jointly-scaled multilingual language-image model. In The Eleventh International Conference on Learning Representations.
  5. PaLM: Scaling Language Modeling with Pathways. In arXiv:2001.08361.
  6. Scaling instruction-finetuned language models.
  7. CLIP-Interrogator. Clip-interrogator. https://github.com/pharmapsychotic/clip-interrogator.
  8. Guillem Collell and Marie-Francine Moens. 2016. Is an Image Worth More than a Thousand Words? On the Fine-Grain Semantic Differences between Visual and Linguistic Representations. In COLING.
  9. Gradio. Gradio. https://www.gradio.app/.
  10. Optimizing prompts for text-to-image generation.
  11. Prompt-to-Prompt Image Editing with Cross Attention Control. In arXiv preprint arXiv:2208.01626.
  12. Clipscore: A reference-free evaluation metric for image captioning. arXiv preprint arXiv:2104.08718.
  13. Jonathan Ho and Tim Salimans. 2021. Classifier-free diffusion guidance. In NeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications.
  14. Underspecification in scene description-to-depiction tasks. In Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1172–1184, Online only. Association for Computational Linguistics.
  15. Scaling up visual and vision-language representation learning with noisy text supervision. In International Conference on Machine Learning, pages 4904–4916. PMLR.
  16. Musiq: Multi-scale image quality transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5148–5157.
  17. The power of scale for parameter-efficient prompt tuning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 3045–3059, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
  18. Ranjita Naik and Besmira Nushi. 2023. Social biases through the text-to-image generation lens. arXiv preprint arXiv:2304.06034.
  19. Hierarchical Text-Conditional Image Generation with CLIP Latents. In arXiv:2204.06125.
  20. Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation.
  21. Photorealistic text-to-image diffusion models with deep language understanding.
  22. Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding. In NeurIPS.
  23. UL2: Unifying language learning paradigms. In The Eleventh International Conference on Learning Representations.
  24. Neural text generation with unlikelihood training. In International Conference on Learning Representations.
  25. Hard prompts made easy: Gradient-based discrete optimization for prompt tuning and discovery.
  26. Coca: Contrastive captioners are image-text foundation models. Transactions on Machine Learning Research.
  27. Scaling Autoregressive Models for Content-Rich Text-to-Image Generation. In arXiv:2206.10789.
Citations (6)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 1 tweet and received 0 likes.

Upgrade to Pro to view all of the tweets about this paper: