Emergent Mind

Abstract

Rationales in the form of manually annotated input spans usually serve as ground truth when evaluating explainability methods in NLP. They are, however, time-consuming and often biased by the annotation process. In this paper, we debate whether human gaze, in the form of webcam-based eye-tracking recordings, poses a valid alternative when evaluating importance scores. We evaluate the additional information provided by gaze data, such as total reading times, gaze entropy, and decoding accuracy with respect to human rationale annotations. We compare WebQAmGaze, a multilingual dataset for information-seeking QA, with attention and explainability-based importance scores for 4 different multilingual Transformer-based language models (mBERT, distil-mBERT, XLMR, and XLMR-L) and 3 languages (English, Spanish, and German). Our pipeline can easily be applied to other tasks and languages. Our findings suggest that gaze data offers valuable linguistic insights that could be leveraged to infer task difficulty and further show a comparable ranking of explainability methods to that of human rationales.

Overview

  • This paper explores webcam-based eye-tracking as an alternative to manual annotation of rationales in NLP and XAI, using the multilingual WebQAmGaze dataset.

  • It evaluates gaze data's reliability through metrics like total reading times and gaze entropy, comparing these to traditional annotations across several languages and Transformer-based models.

  • Findings suggest gaze data can provide meaningful insights into cognitive processes, with its effectiveness varying by language, and closely matching human-annotated rationale rankings.

  • The study advocates for incorporating webcam-based eye-tracking in XAI to reduce annotation resources, despite hardware limitations and the need for further research on task and language diversity.

Assessing Webcam-based Eye-tracking as a Viable Alternative for Annotating Rationales in NLP Explainability

Introduction

In the realm of NLP and Explainable AI (XAI), the annotation of rationales has been a cornerstone for evaluating the effectiveness and reliability of models. However, the process of manually annotating these rationales is not only time-consuming but also subject to biases arising from the annotation process. This has led researchers to explore alternative methods of capturing human reasoning processes. One such method, which this paper focuses on, involves webcam-based eye-tracking recordings to infer the importance scores typically derived from manual annotations. We examine the WebQAmGaze dataset, a multilingual corpus for QA tasks, and its ability to parallel traditional rationale annotations through a comprehensive comparison with attention and explainability-based importance scores across multiple Transformer-based language models and languages.

Data and Methodology

The study revolves around the WebQAmGaze dataset, which entails webcam-based eye-tracking recordings as participants engage with questions in English, Spanish, and German. It aims to assess whether gaze data, reflected through metrics such as total reading times, gaze entropy, and decoding accuracy, can serve as a reliable proxy for manually annotated rationales.

Results

Our analysis yielded several noteworthy findings:

  • Gaze Data as a Linguistic Insight Tool: The gaze data proffered valuable insights into linguistic processes, potentially serving as indicators of task difficulty.
  • Comparable Rankings: The rankings of explainability methods derived from gaze data closely mirrored those obtained from human-annotated rationales across the languages and models tested.
  • Effectiveness Across Languages: Decoding accuracy varied across languages, with particularly promising results for German. This suggests that the efficacy of gaze data as an alternative to manually annotated rationales might be language-dependent.
  • Webcam-based Eye-tracking Feasibility: Despite varying data quality primarily due to hardware constraints (e.g., the use of glasses affecting tracking accuracy), webcam-based eye-tracking emerged as a cost-effective method that could, with certain limitations, replicate the insights provided by lab-quality eye-tracking and manual annotations.

Practical and Theoretical Implications

This study conveys important implications for both the practical application of eye-tracking in XAI and the theoretical understanding of human reasoning processes in NLP tasks. Practically, leveraging webcam-based eye-tracking can significantly reduce the resources required for rationale annotation, making large-scale studies more feasible and possibly enriching datasets with additional annotations that capture a different dimension of human cognition. Theoretically, the findings support the hypothesis that human gaze patterns, indicative of cognitive engagement and information processing, can serve as a meaningful proxy for identifying relevant text spans that explain model decisions.

Future Directions

While this study lays a solid foundation, extensive exploration is warranted. Future research should focus on expanding the variety of tasks and languages examined, integrating more sophisticated models beyond the Transformer architecture, and addressing the limitations associated with data quality in webcam-based eye-tracking. Additionally, integrating gaze data with other psychophysiological signals could further enrich our understanding of the cognitive processes underpinning task performance and model explainability.

Conclusion

In summarizing, this paper advocates for a paradigm shift towards employing low-cost, webcam-based eye-tracking as a viable supplementary, if not alternative, method for annotating rationales in the evaluation of NLP explainability. While it recognizes the limitations of the current methodology, particularly in data quality and language diversity, it underscores the potential of gaze data to offer valuable insights into human cognition and reasoning. As we march forward, bridging the gap between human cognitive processes and AI explainability remains a pivotal task, one that gaze data can significantly contribute to.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.