Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

FoodFusion: A Latent Diffusion Model for Realistic Food Image Generation (2312.03540v1)

Published 6 Dec 2023 in cs.CV

Abstract: Current state-of-the-art image generation models such as Latent Diffusion Models (LDMs) have demonstrated the capacity to produce visually striking food-related images. However, these generated images often exhibit an artistic or surreal quality that diverges from the authenticity of real-world food representations. This inadequacy renders them impractical for applications requiring realistic food imagery, such as training models for image-based dietary assessment. To address these limitations, we introduce FoodFusion, a Latent Diffusion model engineered specifically for the faithful synthesis of realistic food images from textual descriptions. The development of the FoodFusion model involves harnessing an extensive array of open-source food datasets, resulting in over 300,000 curated image-caption pairs. Additionally, we propose and employ two distinct data cleaning methodologies to ensure that the resulting image-text pairs maintain both realism and accuracy. The FoodFusion model, thus trained, demonstrates a remarkable ability to generate food images that exhibit a significant improvement in terms of both realism and diversity over the publicly available image generation models. We openly share the dataset and fine-tuned models to support advancements in this critical field of food image synthesis at https://bit.ly/genai4good.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (9)
  1. High-resolution image synthesis with latent diffusion models, 2021.
  2. Segment anything. arXiv:2304.02643, 2023.
  3. Nutritionverse: Empirical study of various dietary intake estimation approaches, 2023.
  4. Food-101 – mining discriminative components with random forests. In European Conference on Computer Vision, 2014.
  5. Isia food-500: A dataset for large-scale food recognition via stacked global-local attention network. In Proceedings of the 28th ACM International Conference on Multimedia, 2020.
  6. Large scale visual food recognition, 2023.
  7. Nutrition5k: Towards automatic nutritional understanding of generic food. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8903–8911, 2021.
  8. Visual aware hierarchy based food recognition, 2020.
  9. Grounding dino: Marrying dino with grounded pre-training for open-set object detection. arXiv preprint arXiv:2303.05499, 2023.
Citations (3)

Summary

We haven't generated a summary for this paper yet.