Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Transferring Knowledge for Food Image Segmentation using Transformers and Convolutions (2306.09203v1)

Published 15 Jun 2023 in cs.CV

Abstract: Food image segmentation is an important task that has ubiquitous applications, such as estimating the nutritional value of a plate of food. Although machine learning models have been used for segmentation in this domain, food images pose several challenges. One challenge is that food items can overlap and mix, making them difficult to distinguish. Another challenge is the degree of inter-class similarity and intra-class variability, which is caused by the varying preparation methods and dishes a food item may be served in. Additionally, class imbalance is an inevitable issue in food datasets. To address these issues, two models are trained and compared, one based on convolutional neural networks and the other on Bidirectional Encoder representation for Image Transformers (BEiT). The models are trained and valuated using the FoodSeg103 dataset, which is identified as a robust benchmark for food image segmentation. The BEiT model outperforms the previous state-of-the-art model by achieving a mean intersection over union of 49.4 on FoodSeg103. This study provides insights into transfering knowledge using convolution and Transformer-based approaches in the food image domain.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (31)
  1. BEIT: BERT pre-training of image transformers.
  2. Encoder-decoder with atrous separable convolution for semantic image segmentation. In ECCV, 2018.
  3. Masked-attention mask transformer for universal image segmentation.
  4. Food recognition: A new dataset, experiments, and results. 21(3):588–598.
  5. Deformable convolutional networks.
  6. An image is worth 16x16 words: Transformers for image recognition at scale.
  7. EVA: Exploring the limits of masked visual representation learning at scale.
  8. Economic burden of disease-associated malnutrition at the state level. 11(9):e0161833.
  9. Frequency of malnutrition in older adults: a multinational perspective using the mini nutritional assessment. 58(9):1734–1738.
  10. Prevalence and economic burden of malnutrition diagnosis among patients presenting to united states emergency departments. 28(3):325–335.
  11. Food-101 - mining discriminative components with random forests.
  12. Recipe1m+: A dataset for learning cross-modal embeddings for cooking recipes and food images.
  13. UEC-FoodPix complete: A large-scale food image segmentation dataset. In Alberto Del Bimbo, Rita Cucchiara, Stan Sclaroff, Giovanni Maria Farinella, Tao Mei, Marco Bertini, Hugo Jair Escalante, and Roberto Vezzani, editors, Pattern Recognition. ICPR International Workshops and Challenges, volume 12665, pages 647–659. Springer International Publishing. Series Title: Lecture Notes in Computer Science.
  14. BEIT v2: Masked image modeling with vector-quantized visual tokenizers.
  15. Enhancing food intake tracking in long-term care with automated food imaging and nutrient intake tracking (AFINI-t) technology.
  16. Automated food intake tracking requires depth-refined semantic segmentation to rectify visual-volume discordance in long-term care homes. 12(1).
  17. Nutrition in the elderly. 15(6):869–884.
  18. Evaluation and comparison of food records, recalls, and frequencies for energy and protein assessment by using recovery biomarkers. 174(5):591–603.
  19. Learning transferable visual models from natural language supervision.
  20. Zero-shot text-to-image generation.
  21. ”GrabCut”: interactive foreground extraction using iterated graph cuts. In ACM SIGGRAPH 2004 Papers, pages 309–314. ACM.
  22. ImageNet large scale visual recognition challenge.
  23. Nutrition5k: Towards automatic nutritional understanding of generic food. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. ISSN: 10636919.
  24. Nutritional management in long-term care: development of a clinical guideline. council for nutritional strategies in long-term care. 55(12):M725–734.
  25. Attention is all you need.
  26. Image as a foreign language: BEIT pretraining for all vision and vision-language tasks.
  27. InternImage: Exploring large-scale vision foundation models with deformable convolutions.
  28. A large-scale benchmark for food image segmentation.
  29. Pyramid scene parsing network.
  30. Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers.
  31. Scene parsing through ADE20k dataset.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Grant Sinha (1 paper)
  2. Krish Parmar (2 papers)
  3. Hilda Azimi (5 papers)
  4. Amy Tai (4 papers)
  5. Yuhao Chen (84 papers)
  6. Alexander Wong (230 papers)
  7. Pengcheng Xi (21 papers)
Citations (4)

Summary

We haven't generated a summary for this paper yet.