Closing the Knowledge Gap in Designing Data Annotation Interfaces for AI-powered Disaster Management Analytic Systems (2403.01722v1)
Abstract: Data annotation interfaces predominantly leverage ground truth labels to guide annotators toward accurate responses. With the growing adoption of AI in domain-specific professional tasks, it has become increasingly important to help beginning annotators identify how their early-stage knowledge can lead to inaccurate answers, which in turn, helps to ensure quality annotations at scale. To investigate this issue, we conducted a formative study involving eight individuals from the field of disaster management, each possessing varying levels of expertise. The goal was to understand the prevalent factors contributing to disagreements among annotators when classifying Twitter messages related to disasters and to analyze their respective responses. Our analysis identified two primary causes of disagreement between expert and beginner annotators: 1) a lack of contextual knowledge or uncertainty about the situation, and 2) the absence of visual or supplementary cues. Based on these findings, we designed a Context interface, which generates aids that help beginners identify potential mistakes and provide the hidden context of the presented tweet. The summative study compares Context design with two widely used designs in data annotation UI, Highlight and Reasoning-based interfaces. We found significant differences between these designs in terms of attitudinal and behavioral data. We conclude with implications for designing future interfaces aiming at closing the knowledge gap among annotators.
- Mark S Ackerman. 2000. The intellectual challenge of CSCW: the gap between social requirements and technical feasibility. Human–Computer Interaction 15, 2-3 (2000), 179–203.
- Saleema Amershi and Meredith Ringel Morris. 2008. CoSearch: a system for co-located collaborative web search. In Proceedings of the SIGCHI conference on human factors in computing systems. 1647–1656.
- Zinat Ara and Mahdi Hashemi. 2021a. Ride hailing service demand forecast by integrating convolutional and recurrent neural networks. In Proceedings of the 33rd International Conference on Software Engineering and Knowledge Engineering. 463–468.
- Zinat Ara and Mahdi Hashemi. 2021b. Traffic Flow Prediction using Long Short-Term Memory Network and Optimized Spatial Temporal Dependencies. In 2021 IEEE International Conference on Big Data (Big Data). IEEE, 1550–1557.
- Ai-assisted human labeling: Batching for efficiency without overreliance. Proceedings of the ACM on Human-Computer Interaction 5, CSCW1 (2021), 1–27.
- John R Austin. 2003. Transactive memory in organizational groups: the effects of content, consensus, specialization, and accuracy on group performance. Journal of applied psychology 88, 5 (2003), 866.
- Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback. arXiv:2204.05862 [cs.CL]
- Beat the AI: Investigating adversarial human annotation for reading comprehension. Transactions of the Association for Computational Linguistics 8 (2020), 662–678.
- Frank R Bentley and S Tejaswi Peesapati. 2017. SearchMessenger: Exploring the use of search and card sharing in a messaging application. In Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing. 1946–1956.
- Gerlof Bouma. 2009. Normalized (pointwise) mutual information in collocation extraction. Proceedings of GSCL 30 (2009), 31–40.
- Reliance and Automation for Human-AI Collaborative Data Labeling Conflict Resolution. Proc. ACM Hum.-Comput. Interact. 6, CSCW2, Article 321 (nov 2022), 27 pages. https://doi.org/10.1145/3555212
- James V Bradley. 1958. Complete counterbalancing of immediate sequential effects in a Latin square design. J. Amer. Statist. Assoc. 53, 282 (1958). https://www.tandfonline.com/doi/abs/10.1080/01621459.1958.10501456
- Determinants of individual engagement in knowledge sharing. The International Journal of Human Resource Management 17, 2 (2006), 245–264.
- Quoc Dung Cao and Youngjun Choe. 2020. Building damage annotation on post-hurricane satellite imagery based on convolutional neural networks. Natural Hazards 103, 3 (2020), 3357–3376.
- Mark J. Carlotto. 2009. Effect of errors in ground truth on classification accuracy. International Journal of Remote Sensing 30, 18 (2009), 4831–4849. https://doi.org/10.1080/01431160802672864 arXiv:https://doi.org/10.1080/01431160802672864
- Crowdsourcing Multi-Label Audio Annotation Tasks with Citizen Scientists. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (Glasgow, Scotland Uk) (CHI ’19). Association for Computing Machinery, New York, NY, USA, 1–11. https://doi.org/10.1145/3290605.3300522
- Revolt: Collaborative Crowdsourcing for Labeling Machine Learning Datasets. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (Denver, Colorado, USA) (CHI ’17). Association for Computing Machinery, New York, NY, USA, 2334–2346. https://doi.org/10.1145/3025453.3026044
- Goldilocks: Consistent Crowdsourced Scalar Annotations with Relative Uncertainty. Proceedings of the ACM on Human-Computer Interaction 5, CSCW2 (oct 2021), 1–25. https://doi.org/10.1145/3476076
- Aila: Attentive interactive labeling assistant for document classification through attention-based deep neural networks. In Proceedings of the 2019 CHI conference on human factors in computing systems. 1–12. https://doi.org/10.1145/3290605.3300460
- Efficient elicitation approaches to estimate collective crowd answers. Proceedings of the ACM on Human-Computer Interaction 3, CSCW (2019), 1–25.
- U.S. Department of Homeland Security FEMA. 2022. Community Emergency Response Team (CERT). https://www.ready.gov/cert April 19.
- Gnes: Learning to explain graph neural networks. In 2021 IEEE International Conference on Data Mining (ICDM). IEEE, 131–140.
- Res: A robust framework for guiding visual explanation. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 432–442.
- Aligning eyes between humans and deep neural network through interactive attention alignment. Proceedings of the ACM on Human-Computer Interaction 6, CSCW2 (2022), 1–28.
- A Study on Annotation Interfaces for Summary Comparison. In Proceedings of the 17th Linguistic Annotation Workshop (LAW-XVII). 179–187.
- Collaborative dynamic queries: Supporting distributed small group decision-making. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. 1–12. https://dl.acm.org/doi/abs/10.1145/3173574.3173640
- Design for collaborative information-seeking: Understanding user challenges and deploying collaborative dynamic queries. Proceedings of the ACM on Human-Computer Interaction 3, CSCW (2019), 1–24.
- Human factors in model interpretability: Industry practices, challenges, and needs. Proceedings of the ACM on Human-Computer Interaction 4, CSCW1 (2020), 1–26.
- Disseminating Machine Learning to domain experts: Understanding challenges and opportunities in supporting a model building process. In CHI 2019 Workshop, Emerging Perspectives in Human-Centered Machine Learning. ACM.
- Collaborative Dynamic Queries: Supporting Distributed Small Group Decision-Making. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (Montreal QC, Canada) (CHI ’18). Association for Computing Machinery, New York, NY, USA, 1–12. https://doi.org/10.1145/3173574.3173640
- Design for Collaborative Information-Seeking: Understanding User Challenges and Deploying Collaborative Dynamic Queries. Proc. ACM Hum.-Comput. Interact. 3, CSCW, Article 106 (nov 2019), 24 pages. https://doi.org/10.1145/3359208
- Social media in emergency management. 349–392 pages.
- Ai for disaster rapid damage assessment from microblogs. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36. 12517–12523.
- OpenAI Inc. 2023a. ChatGPT. https://chat.openai.com/ Accessed: 2023-09-20.
- Twitter Inc. 2023b. Twitter. https://twitter.com/ Accessed: 2023-09-13.
- Sanjay Kairam and Jeffrey Heer. 2016. Parting Crowds: Characterizing Divergent Interpretations in Crowdsourced Annotation Tasks. In Proceedings of the 19th ACM Conference on Computer-Supported Cooperative Work & Social Computing (San Francisco, California, USA) (CSCW ’16). Association for Computing Machinery, New York, NY, USA, 1637–1648. https://doi.org/10.1145/2818048.2820016
- A Hunt for the Snark: Annotator Diversity in Data Practices. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (Hamburg, Germany) (CHI ’23). Association for Computing Machinery, New York, NY, USA, Article 133, 15 pages. https://doi.org/10.1145/3544548.3580645
- Transformers in vision: A survey. ACM computing surveys (CSUR) 54, 10s (2022), 1–41.
- Research Methods in Human-Computer Interaction. Wiley Publishing.
- Deep reinforcement active learning for human-in-the-loop person re-identification. In Proceedings of the IEEE/CVF international conference on computer vision. 6122–6131.
- An integrated iterative annotation technique for easing neural network training in medical image analysis. Nature machine intelligence 1, 2 (2019), 112–119.
- Efficient human-in-the-loop object detection using bi-directional deep sort and annotation-free segment identification. In 2020 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC). IEEE, 1226–1233.
- Saket SR Mengle and Nazli Goharian. 2009. Ambiguity measure feature-selection algorithm. Journal of the American Society for Information Science and Technology 60, 5 (2009), 1037–1050.
- Modeling and mitigating human annotation errors to design efficient stream processing systems with human-in-the-loop machine learning. International Journal of Human-Computer Studies 160 (2022), 102772.
- When Official Systems Overload: A Framework for Finding Social Media Calls for Help during Evacuations. (2019), 867–875.
- An expert–novice comparison of feature choice. Applied Cognitive Psychology 34, 5 (Sept. 2020), 984–995. https://doi.org/10.1002/acp.3676
- Daniel M. Russell and Ed H. Chi. 2014. Looking Back: Retrospective Study Methods for HCI. In Ways of Knowing in HCI. https://api.semanticscholar.org/CorpusID:2470741
- The value of human data annotation for machine learning based anomaly detection in environmental systems. Water Research 206 (2021), 117695. https://doi.org/10.1016/j.watres.2021.117695
- Visus: An interactive system for automatic machine learning model building and curation. In Proceedings of the Workshop on Human-In-the-Loop Data Analytics. 1–7.
- Citizen-Helper System for Human-Centered AI Use in Disaster Management. In International Handbook of Disaster Research. Springer, 1–21.
- Mining risk behaviors from social media for pandemic crisis preparedness and response. In Proceedings of the 2021 international conference on social computing, behavioral-cultural modeling & prediction and behavior representation in modeling and simulation.
- Cloud services composition support by using semantic annotation and linked data. In Knowledge Discovery, Knowledge Engineering and Knowledge Management: Third International Joint Conference, IC3K 2011, Paris, France, October 26-29, 2011. Revised Selected Papers 3. Springer, 278–293.
- National Weather Service. 2023. Hurricane IAN. https://www.weather.gov/mhx/HurricaneIan093022 Accessed: 2023-09-13.
- Gautam Kishore Shahi and Tim A Majchrzak. 2021. Amused: an annotation framework of multimodal social media data. In International Conference on Intelligent Technologies and Applications. Springer, 287–299.
- Ben Shneiderman and Hyunmo Kang. 2000. Direct annotation: A drag-and-drop strategy for labeling photos. In 2000 IEEE Conference on Information Visualization. An International Conference on Computer Visualization and Graphics. IEEE, 88–95.
- Interface Design for Crowdsourcing Hierarchical Multi-Label Text Annotations. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. ACM. https://doi.org/10.1145/3544548.3581431
- Designing a Direct Feedback Loop between Humans and Convolutional Neural Networks through Local Explanations. Proceedings of the ACM on Human-Computer Interaction 7, CSCW2 (2023), 1–32.
- Alistair Sutcliffe. 2005. Applying small group theory to analysis and design of CSCW systems. SIGSOFT Softw. Eng. Notes 30, 4 (may 2005), 1–6. https://doi.org/10.1145/1082983.1083119
- Interactive Consensus Agreement Games for Labeling Images. Proceedings of the AAAI Conference on Human Computation and Crowdsourcing 4 (Sept. 2016), 239–248. https://doi.org/10.1609/hcomp.v4i1.13293
- Whose AI Dream? In search of the aspiration in data annotation. arXiv:2203.10748 [cs.HC]
- Gigified Knowledge Work: Understanding Knowledge Gaps When Knowledge Work and On-Demand Work Intersect. Proc. ACM Hum.-Comput. Interact. 6, CSCW1, Article 93 (apr 2022), 27 pages. https://doi.org/10.1145/3512940
- FlatMagic: Improving flat colorization through AI-driven design for digital comic professionals. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems. 1–17.
- Context-faithful Prompting for Large Language Models. In Findings of the Association for Computational Linguistics: EMNLP 2023, Houda Bouamor, Juan Pino, and Kalika Bali (Eds.). Association for Computational Linguistics, Singapore, 14544–14556. https://doi.org/10.18653/v1/2023.findings-emnlp.968