Revisiting Hate Speech Benchmarks: From Data Curation to System Deployment

Published 1 Jun 2023 in cs.CL | (2306.01105v2)

Abstract: Social media is awash with hateful content, much of which is often veiled with linguistic and topical diversity. The benchmark datasets used for hate speech detection do not account for such divagation as they are predominantly compiled using hate lexicons. However, capturing hate signals becomes challenging in neutrally-seeded malicious content. Thus, designing models and datasets that mimic the real-world variability of hate warrants further investigation. To this end, we present GOTHate, a large-scale code-mixed crowdsourced dataset of around 51k posts for hate speech detection from Twitter. GOTHate is neutrally seeded, encompassing different languages and topics. We conduct detailed comparisons of GOTHate with the existing hate speech datasets, highlighting its novelty. We benchmark it with 10 recent baselines. Our extensive empirical and benchmarking experiments suggest that GOTHate is hard to classify in a text-only setup. Thus, we investigate how adding endogenous signals enhances the hate speech detection task. We augment GOTHate with the user's timeline information and ego network, bringing the overall data source closer to the real-world setup for understanding hateful content. Our proposed solution HEN-mBERT is a modular, multilingual, mixture-of-experts model that enriches the linguistic subspace with latent endogenous signals from history, topology, and exemplars. HEN-mBERT transcends the best baseline by 2.5% and 5% in overall macro-F1 and hate class F1, respectively. Inspired by our experiments, in partnership with Wipro AI, we are developing a semi-automated pipeline to detect hateful content as a part of their mission to tackle online harm.

Abstract PDF HTML Upgrade to Chat

Authors (4)

Citations (10)

View on Semantic Scholar

Summary

The paper introduces the GOTHate dataset with 51,000 tweets annotated into four nuanced labels, reducing bias with a neutrally seeded approach.
The study presents HEN-mBERT, a modular model that integrates user-centric signals, achieving up to a 5% F1 improvement in hate speech detection.
Its findings pave the way for context-aware automated moderation systems, urging a reexamination of hate speech dataset curation practices.

Revisiting Hate Speech Benchmarks: From Data Curation to System Deployment

The paper "Revisiting Hate Speech Benchmarks: From Data Curation to System Deployment" presents a substantial contribution to the field of hate speech detection on social media platforms. The authors introduce a novel dataset, GOTHate, which seeks to address the limitations of existing benchmarks that predominantly rely on hate lexicons and fail to capture the nuanced nature of online hate speech. This dataset is significant as it encompasses a diverse range of socio-political topics and languages, including English, Hindi, and Hinglish, offering a more realistic representation of the variability in online discourse.

One of the paper's key innovations is the GOTHate dataset, which consists of around 51,000 Twitter posts annotated with four labels: Hate, Offensive, Provocative, and Neutral. This dataset is neutrally seeded, meaning the collection of posts is not biased by predefined hate lexicons, which often skew the annotation towards explicit content. Therefore, GOTHate provides a more nuanced and challenging environment for classification models, emphasizing context over explicit keyword triggers. The authors highlight that this approach reduces linguistic and syntactic biases, and offers a more representative sample of real-world hate speech.

In comparison with existing datasets such as those by Davidson et al. (2017) and Founta et al. (2018), GOTHate exhibits a lower inter-class divergence, as measured by Jensen-Shannon divergence, indicating a more intricate overlap between classes. This characteristic arguably makes it one of the more challenging datasets for classification models. The paper also explores adversarial validation and cross-dataset validation with other hate speech datasets, finding that GOTHate provides a unique contribution to understanding hate speech dynamics.

The authors also propose HEN-mBERT, a modular model enhancement to the multilingual BERT architecture, which incorporates user-centric auxiliary signals like timelines and ego networks to improve model performance. The use of a modular mixture-of-experts approach enables the model to handle the inherent variability and complexity in hate speech detection tasks more effectively. The experimental results suggest that HEN-mBERT significantly outperforms previous baselines, especially in harder-to-detect classes such as Hate and Provocative, with up to a 5% improvement in the F1 score for hate detection.

Practically, the implications of this research are profound. By developing robust methods for contextual hate speech detection, the authors highlight potential applications in automated content moderation systems, potentially curtailing the propagation of harmful content on social media. In collaboration with Wipro AI, the researchers are advancing a semi-automated content moderation system that utilizes the HEN-mBERT framework to enhance human moderators' capabilities in identifying and flagging hate speech.

Theoretical implications include a call to the community to reconsider how hate speech datasets are curated and utilized in training and evaluation. The nuanced approach reflected in GOTHate could steer new directions for research in hate speech detection, emphasizing context and user behavior over simplistic keyword identification.

Future work could explore broader applications of the proposed methods to other forms of online toxicity beyond hate speech, such as misinformation and harassment. Moreover, the integration of this work into real-world systems raises intriguing possibilities for longitudinal studies on the impact of advanced detection systems on reducing hate speech prevalence.

In conclusion, the introduction of the GOTHate dataset and HEN-mBERT model represents a notable shift towards more sophisticated and realistic hate speech detection systems. The authors' work compels further research into embedding user and context-aware signals in hate speech detection frameworks, which could have significant reverberations across both academic research and practical applications in online safety.

Markdown Report Issue