Evaluating Webcam-based Gaze Data as an Alternative for Human Rationale Annotations (2402.19133v1)

Published 29 Feb 2024 in cs.CL

Abstract: Rationales in the form of manually annotated input spans usually serve as ground truth when evaluating explainability methods in NLP. They are, however, time-consuming and often biased by the annotation process. In this paper, we debate whether human gaze, in the form of webcam-based eye-tracking recordings, poses a valid alternative when evaluating importance scores. We evaluate the additional information provided by gaze data, such as total reading times, gaze entropy, and decoding accuracy with respect to human rationale annotations. We compare WebQAmGaze, a multilingual dataset for information-seeking QA, with attention and explainability-based importance scores for 4 different multilingual Transformer-based LLMs (mBERT, distil-mBERT, XLMR, and XLMR-L) and 3 languages (English, Spanish, and German). Our pipeline can easily be applied to other tasks and languages. Our findings suggest that gaze data offers valuable linguistic insights that could be leveraged to infer task difficulty and further show a comparable ranking of explainability methods to that of human rationales.

References (57)

Authors (5)

Stephanie Brandl (14 papers)
Oliver Eberle (14 papers)
Tiago Ribeiro (29 papers)
Anders Søgaard (122 papers)
Nora Hollenstein (21 papers)

Summary

The paper establishes that webcam-based eye-tracking can serve as a reliable alternative to manual rationale annotations in NLP explainability.
It uses the multilingual WebQAmGaze dataset to compare gaze metrics with traditional attention scores across English, Spanish, and German.
Findings reveal that despite hardware limitations, gaze data provide robust linguistic insights and consistent ranking of model explanations.

Assessing Webcam-based Eye-tracking as a Viable Alternative for Annotating Rationales in NLP Explainability

Introduction

In the field of NLP and Explainable AI (XAI), the annotation of rationales has been a cornerstone for evaluating the effectiveness and reliability of models. However, the process of manually annotating these rationales is not only time-consuming but also subject to biases arising from the annotation process. This has led researchers to explore alternative methods of capturing human reasoning processes. One such method, which this paper focuses on, involves webcam-based eye-tracking recordings to infer the importance scores typically derived from manual annotations. We examine the WebQAmGaze dataset, a multilingual corpus for QA tasks, and its ability to parallel traditional rationale annotations through a comprehensive comparison with attention and explainability-based importance scores across multiple Transformer-based LLMs and languages.

Data and Methodology

The paper revolves around the WebQAmGaze dataset, which entails webcam-based eye-tracking recordings as participants engage with questions in English, Spanish, and German. It aims to assess whether gaze data, reflected through metrics such as total reading times, gaze entropy, and decoding accuracy, can serve as a reliable proxy for manually annotated rationales.

Results

Our analysis yielded several noteworthy findings:

Gaze Data as a Linguistic Insight Tool: The gaze data proffered valuable insights into linguistic processes, potentially serving as indicators of task difficulty.
Comparable Rankings: The rankings of explainability methods derived from gaze data closely mirrored those obtained from human-annotated rationales across the languages and models tested.
Effectiveness Across Languages: Decoding accuracy varied across languages, with particularly promising results for German. This suggests that the efficacy of gaze data as an alternative to manually annotated rationales might be language-dependent.
Webcam-based Eye-tracking Feasibility: Despite varying data quality primarily due to hardware constraints (e.g., the use of glasses affecting tracking accuracy), webcam-based eye-tracking emerged as a cost-effective method that could, with certain limitations, replicate the insights provided by lab-quality eye-tracking and manual annotations.

Practical and Theoretical Implications

This paper conveys important implications for both the practical application of eye-tracking in XAI and the theoretical understanding of human reasoning processes in NLP tasks. Practically, leveraging webcam-based eye-tracking can significantly reduce the resources required for rationale annotation, making large-scale studies more feasible and possibly enriching datasets with additional annotations that capture a different dimension of human cognition. Theoretically, the findings support the hypothesis that human gaze patterns, indicative of cognitive engagement and information processing, can serve as a meaningful proxy for identifying relevant text spans that explain model decisions.

Future Directions

While this paper lays a solid foundation, extensive exploration is warranted. Future research should focus on expanding the variety of tasks and languages examined, integrating more sophisticated models beyond the Transformer architecture, and addressing the limitations associated with data quality in webcam-based eye-tracking. Additionally, integrating gaze data with other psychophysiological signals could further enrich our understanding of the cognitive processes underpinning task performance and model explainability.

Conclusion

In summarizing, this paper advocates for a paradigm shift towards employing low-cost, webcam-based eye-tracking as a viable supplementary, if not alternative, method for annotating rationales in the evaluation of NLP explainability. While it recognizes the limitations of the current methodology, particularly in data quality and language diversity, it underscores the potential of gaze data to offer valuable insights into human cognition and reasoning. As we march forward, bridging the gap between human cognitive processes and AI explainability remains a pivotal task, one that gaze data can significantly contribute to.

PDF Markdown

Tweets

https://twitter.com/StephanieBrandl/status/1763491432956858481