Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Extracting Human Attention through Crowdsourced Patch Labeling (2403.15013v1)

Published 22 Mar 2024 in cs.CV and cs.HC

Abstract: In image classification, a significant problem arises from bias in the datasets. When it contains only specific types of images, the classifier begins to rely on shortcuts - simplistic and erroneous rules for decision-making. This leads to high performance on the training dataset but inferior results on new, varied images, as the classifier's generalization capability is reduced. For example, if the images labeled as mustache consist solely of male figures, the model may inadvertently learn to classify images by gender rather than the presence of a mustache. One approach to mitigate such biases is to direct the model's attention toward the target object's location, usually marked using bounding boxes or polygons for annotation. However, collecting such annotations requires substantial time and human effort. Therefore, we propose a novel patch-labeling method that integrates AI assistance with crowdsourcing to capture human attention from images, which can be a viable solution for mitigating bias. Our method consists of two steps. First, we extract the approximate location of a target using a pre-trained saliency detection model supplemented by human verification for accuracy. Then, we determine the human-attentive area in the image by iteratively dividing the image into smaller patches and employing crowdsourcing to ascertain whether each patch can be classified as the target object. We demonstrated the effectiveness of our method in mitigating bias through improved classification accuracy and the refined focus of the model. Also, crowdsourced experiments validate that our method collects human annotation up to 3.4 times faster than annotating object locations with polygons, significantly reducing the need for human resources. We conclude the paper by discussing the advantages of our method in a crowdsourcing context, mainly focusing on aspects of human errors and accessibility.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (65)
  1. Turning a blind eye: Explicit removal of biases and variation from deep neural network embeddings. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops. 0–0.
  2. AI-Assisted Human Labeling: Batching for Efficiency without Overreliance. Proc. ACM Hum.-Comput. Interact. 5, CSCW1, Article 89 (apr 2021), 27 pages. https://doi.org/10.1145/3449163
  3. Surf: Speeded up robust features. In Computer Vision–ECCV 2006: 9th European Conference on Computer Vision, Graz, Austria, May 7-13, 2006. Proceedings, Part I 9. Springer, 404–417.
  4. Reliance and Automation for Human-AI Collaborative Data Labeling Conflict Resolution. Proc. ACM Hum.-Comput. Interact. 6, CSCW2, Article 321 (nov 2022), 27 pages. https://doi.org/10.1145/3555212
  5. Advances in Auto-Segmentation. Seminars in Radiation Oncology 29, 3 (2019), 185–197. https://doi.org/10.1016/j.semradonc.2019.02.001 Adaptive Radiotherapy and Automation.
  6. Fine-tuning convolutional neural networks for fine art classification. Expert Systems with Applications 114 (2018), 107–118.
  7. Mobile Crowdsourcing in the Wild: Challenges from a Global Community. In Proceedings of the 20th International Conference on Human-Computer Interaction with Mobile Devices and Services Adjunct (Barcelona, Spain) (MobileHCI ’18). Association for Computing Machinery, New York, NY, USA, 410–415. https://doi.org/10.1145/3236112.3236176
  8. D. Comaniciu and P. Meer. 2002. Mean shift: a robust approach toward feature space analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence 24, 5 (2002), 603–619. https://doi.org/10.1109/34.1000236
  9. Predicting Human Eye Fixations via an LSTM-Based Saliency Attentive Model. IEEE Transactions on Image Processing 27, 10 (2018), 5142–5154. https://doi.org/10.1109/TIP.2018.2851672
  10. A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise (KDD’96). AAAI Press, 226–231.
  11. Data quality of platforms and panels for online behavioral research. Behavior Research Methods (2021), 1–20.
  12. Leah Findlater and Lotus Zhang. 2020. Input Accessibility: A Large Dataset and Summary Analysis of Age, Motor Ability and Input Performance. In Proceedings of the 22nd International ACM SIGACCESS Conference on Computers and Accessibility (Virtual Event, Greece) (ASSETS ’20). Association for Computing Machinery, New York, NY, USA, Article 17, 6 pages. https://doi.org/10.1145/3373625.3417031
  13. Overcoming dataset bias: An unsupervised domain adaptation approach. In NIPS Workshop on Large Scale Visual Recognition and Retrieval, Vol. 3. Citeseer.
  14. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770–778.
  15. Efficient Human-in-the-Loop System for Guiding DNNs Attention. In Proceedings of the 28th International Conference on Intelligent User Interfaces (Sydney, NSW, Australia) (IUI ’23). Association for Computing Machinery, New York, NY, USA, 294–306. https://doi.org/10.1145/3581641.3584074
  16. Progressive growing of gans for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196 (2017).
  17. Undoing the damage of dataset bias. In Computer Vision–ECCV 2012: 12th European Conference on Computer Vision, Florence, Italy, October 7-13, 2012, Proceedings, Part I 12. Springer, 158–171.
  18. Learning Not to Learn: Training Deep Neural Networks With Biased Data. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 9004–9012. https://doi.org/10.1109/CVPR.2019.00922
  19. Segment Anything. arXiv:2304.02643 [cs.CV]
  20. What does your gaze reveal about you? On the privacy implications of eye tracking. Privacy and Identity Management. Data for Better Living: AI and Privacy: 14th IFIP WG 9.2, 9.6/11.7, 11.6/SIG 9.2. 2 International Summer School, Windisch, Switzerland, August 19–23, 2019, Revised Selected Papers 14 (2020), 226–241.
  21. Depth Matters: Influence of Depth Cues on Visual Saliency. In Computer Vision – ECCV 2012, Andrew Fitzgibbon, Svetlana Lazebnik, Pietro Perona, Yoichi Sato, and Cordelia Schmid (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 101–115.
  22. Feiyang Li and Jiangtao Wang. 2023. Remote Sensing Image Scene Classification via Regional Growth-Based Key Area Fine Location and Multilayer Feature Fusion. IEEE Geoscience and Remote Sensing Letters 20 (2023), 1–5. https://doi.org/10.1109/LGRS.2022.3233374
  23. Single-Object-Based Region Growth: Key Area Localization Model for Remote Sensing Image Scene Classification. 2022 (jan 2022), 9 pages. https://doi.org/10.1155/2022/5816565
  24. Tell me where to look: Guided attention inference network. In Proceedings of the IEEE conference on computer vision and pattern recognition. 9215–9223.
  25. Guided attention inference network. IEEE transactions on pattern analysis and machine intelligence 42, 12 (2019), 2996–3010.
  26. A whac-a-mole dilemma: Shortcuts come in multiples where mitigating one amplifies others. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 20071–20082.
  27. Microsoft COCO: Common Objects in Context. In Computer Vision – ECCV 2014, David Fleet, Tomas Pajdla, Bernt Schiele, and Tinne Tuytelaars (Eds.). Springer International Publishing, Cham, 740–755.
  28. Where To Focus: Investigating Hierarchical Attention Relationship For Fine-Grained Visual Classification. In Computer Vision – ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXIV (Tel Aviv, Israel). Springer-Verlag, Berlin, Heidelberg, 57–73. https://doi.org/10.1007/978-3-031-20053-3_4
  29. Deep Learning Face Attributes in the Wild. In Proceedings of International Conference on Computer Vision (ICCV).
  30. S. Lloyd. 1982. Least squares quantization in PCM. IEEE Transactions on Information Theory 28, 2 (1982), 129–137. https://doi.org/10.1109/TIT.1982.1056489
  31. Learning transferable features with deep adaptation networks. In International conference on machine learning. PMLR, 97–105.
  32. Subhransu Maji. 2011. Large scale image annotations on amazon mechanical turk. EECS Department, University of California, Berkeley, Tech. Rep. UCB/EECS-2011-79 (2011).
  33. Ankit Manerikar and Avinash C Kak. 2023. Self-Supervised One-Shot Learning for Automatic Segmentation of StyleGAN Images. arXiv preprint arXiv:2303.05639 (2023).
  34. Do humans and Convolutional Neural Networks attend to similar areas during scene classification: Effects of task and image type. ArXiv abs/2307.13345 (2023). https://api.semanticscholar.org/CorpusID:260155115
  35. Mohamed Musthag and Deepak Ganesan. 2013. Labor dynamics in a mobile micro-task market. In Proceedings of the SIGCHI conference on human factors in computing systems. 641–650.
  36. Automated Object Labeling For Cnn-Based Image Segmentation. In 2020 IEEE International Conference on Image Processing (ICIP). 2036–2040. https://doi.org/10.1109/ICIP40778.2020.9191320
  37. Prolific. 2023. Prolific: A Crowdsourcing Platform. https://www.prolific.com/
  38. Through a Fair Looking-Glass: Mitigating Bias in Image Datasets. In Artificial Intelligence in HCI, Helmut Degen and Stavroula Ntoa (Eds.). Springer Nature Switzerland, Cham, 446–459.
  39. Kandinsky: an Improved Text-to-Image Synthesis with Image Prior and Latent Diffusion. arXiv:2310.03502 [cs.CV]
  40. Human Attention in Fine-grained Classification. In British Machine Vision Conference. https://api.semanticscholar.org/CorpusID:240419768
  41. Azriel Rosenfeld and John L Pfaltz. 1966. Sequential operations in digital picture processing. Journal of the ACM (JACM) 13, 4 (1966), 471–494.
  42. MOON: A Mixed Objective Optimization Network for the Recognition of Facial Attributes, Vol. 9909. https://doi.org/10.1007/978-3-319-46454-1_2
  43. Deep learning models for webcam eye tracking in online experiments. Behavior Research Methods (2023), 1–17.
  44. A lightweight deep learning model for automatic segmentation and analysis of ophthalmic images. Scientific reports 12, 1 (2022), 8508.
  45. Human-AI Interactive and Continuous Sensemaking: A Case Study of Image Classification Using Scribble Attention Maps. In Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems (Yokohama, Japan) (CHI EA ’21). Association for Computing Machinery, New York, NY, USA, Article 290, 8 pages. https://doi.org/10.1145/3411763.3451798
  46. SilverAI. 2023. SnapEdit: Object Deletion. https://www.snapedit.app
  47. Shardeep Kaur Sooch and Darpan Anand. 2021. Emotion Classification and Facial Key point detection using AI. In 2021 2nd International Conference on Advances in Computing, Communication, Embedded and Secure Systems (ACCESS). 1–5. https://doi.org/10.1109/ACCESS51619.2021.9563289
  48. Antonio Torralba and Alexei A. Efros. 2011. Unbiased look at dataset bias. In CVPR 2011. 1521–1528. https://doi.org/10.1109/CVPR.2011.5995347
  49. Open-Set Recognition: a Good Closed-Set Classifier is All You Need?. In International Conference on Learning Representations.
  50. Explainable image classification with evidence counterfactual. Pattern Analysis and Applications 25, 2 (2022), 315–335.
  51. Warren J von Eschenbach. 2021. Transparency and the black box problem: Why we do not trust AI. Philosophy & Technology 34, 4 (2021), 1607–1622.
  52. A Salient Object Detection Method Based on Boundary Enhancement. Sensors 23, 16 (2023), 7077.
  53. Zero-Shot Learning—A Comprehensive Evaluation of the Good, the Bad and the Ugly. IEEE Transactions on Pattern Analysis and Machine Intelligence 41, 9 (2019), 2251–2265. https://doi.org/10.1109/TPAMI.2018.2857768
  54. Bi-attention network for bi-directional salient object detection. Applied Intelligence (2023), 1–17.
  55. Shota Yamanaka and Hiroki Usuba. 2022. Computing Touch-Point Ambiguity on Mobile Touchscreens for Modeling Target Selection Times. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 5, 4, Article 186 (dec 2022), 21 pages. https://doi.org/10.1145/3494976
  56. HSI: Human Saliency Imitator for Benchmarking Saliency-Based Model Explanations. Proceedings of the AAAI Conference on Human Computation and Crowdsourcing 10, 1 (Oct. 2022), 231–242. https://doi.org/10.1609/hcomp.v10i1.22002
  57. WeCrowd: A WeChat based mobile crowdsourcing platform. In 2017 IEEE 21st International Conference on Computer Supported Cooperative Work in Design (CSCWD). 30–35. https://doi.org/10.1109/CSCWD.2017.8066666
  58. Inpaint Anything: Segment Anything Meets Image Inpainting. arXiv:2304.06790 [cs.CV]
  59. CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features. CoRR abs/1905.04899 (2019). arXiv:1905.04899 http://arxiv.org/abs/1905.04899
  60. Examining CNN representations with respect to Dataset Bias. In AAAI Conference on Artificial Intelligence. https://api.semanticscholar.org/CorpusID:6347939
  61. Human Gaze Assisted Artificial Intelligence: A Review. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI-20, Christian Bessiere (Ed.). International Joint Conferences on Artificial Intelligence Organization, 4951–4958. https://doi.org/10.24963/ijcai.2020/689 Survey track.
  62. A Large-Scale Attribute Dataset for Zero-Shot Learning. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) (2018), 398–407. https://api.semanticscholar.org/CorpusID:4797043
  63. Ting Zhao and Xiangqian Wu. 2019. Pyramid Feature Attention Network for Saliency Detection. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 3080–3089. https://doi.org/10.1109/CVPR.2019.00320
  64. Learning Deep Features for Discriminative Localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
  65. Segment Everything Everywhere All at Once. arXiv:2304.06718 [cs.CV]

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets