Counterfactually Augmented Data and Unintended Bias: The Case of Sexism and Hate Speech Detection (2205.04238v1)

Published 9 May 2022 in cs.CL

Abstract: Counterfactually Augmented Data (CAD) aims to improve out-of-domain generalizability, an indicator of model robustness. The improvement is credited with promoting core features of the construct over spurious artifacts that happen to correlate with it. Yet, over-relying on core features may lead to unintended model bias. Especially, construct-driven CAD -- perturbations of core features -- may induce models to ignore the context in which core features are used. Here, we test models for sexism and hate speech detection on challenging data: non-hateful and non-sexist usage of identity and gendered terms. In these hard cases, models trained on CAD, especially construct-driven CAD, show higher false-positive rates than models trained on the original, unperturbed data. Using a diverse set of CAD -- construct-driven and construct-agnostic -- reduces such unintended bias.

Authors (4)

Indira Sen (14 papers)
Mattia Samory (16 papers)
Claudia Wagner (37 papers)
Isabelle Augenstein (131 papers)

Citations (16)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Counterfactually Augmented Data and Unintended Bias: The Case of Sexism and Hate Speech Detection (2205.04238v1)

Summary

Related Papers