Hate Speech Detection: A Solved Problem? The Challenging Case of Long Tail on Twitter

Published 27 Feb 2018 in cs.CL | (1803.03662v2)

Abstract: In recent years, the increasing propagation of hate speech on social media and the urgent need for effective counter-measures have drawn significant investment from governments, companies, and researchers. A large number of methods have been developed for automated hate speech detection online. This aims to classify textual content into non-hate or hate speech, in which case the method may also identify the targeting characteristics (i.e., types of hate, such as race, and religion) in the hate speech. However, we notice significant difference between the performance of the two (i.e., non-hate v.s. hate). In this work, we argue for a focus on the latter problem for practical reasons. We show that it is a much more challenging task, as our analysis of the language in the typical datasets shows that hate speech lacks unique, discriminative features and therefore is found in the 'long tail' in a dataset that is difficult to discover. We then propose Deep Neural Network structures serving as feature extractors that are particularly effective for capturing the semantics of hate speech. Our methods are evaluated on the largest collection of hate speech datasets based on Twitter, and are shown to be able to outperform the best performing method by up to 5 percentage points in macro-average F1, or 8 percentage points in the more challenging case of identifying hateful content.

Abstract PDF Upgrade to Chat

Authors (2)

Citations (274)

View on Semantic Scholar

Summary

The paper proposes innovative DNN architectures (GRU+CNN and skipped CNN) that improve macro F1 scores by up to 8 points in hate speech detection.
The study reveals that traditional n-gram features are insufficient due to the subtle and sparse nature of hate speech in imbalanced Twitter datasets.
By integrating advanced preprocessing and multiple pre-trained word embeddings, the research enhances model robustness for real-world social media analysis.

Hate Speech Detection: Examining the Long Tail on Twitter

The paper by Zhang and Luo investigates the nuances of hate speech detection on social media platforms, specifically focusing on Twitter—a medium marked by its brevity in content and diverse usage patterns. Despite numerous advancements in automated hate speech detection, the paper argues that existing methods disproportionately excel in identifying non-hate content, often failing to adequately capture the complexity and diversity of hate speech. The paper thus redirects attention towards the challenging aspect of classifying hate speech in what is referred to as the "long tail," a phenomenon where hate speech instances are sparse and lack distinctive linguistic characteristics.

Key Findings and Contributions

The authors conducted an empirical analysis of several publicly available Twitter datasets to understand the inherent challenges in hate speech detection. This analysis revealed a pronounced class imbalance, where non-hate content vastly outnumbers hate speech across datasets. Moreover, hate speech, when present, does not exhibit unique linguistic features, rendering traditional n-gram based features less effective in discriminating it from non-hate content.

To address these challenges, Zhang and Luo propose two novel approaches based on Deep Neural Networks (DNNs): a network using a combination of Gated Recurrent Units (GRU) and Convolutional Neural Networks (CNN), and another employing a modified CNN structure to capture "skipped" grammatical relationships akin to skip-grams. Both methods aim to improve the semantic capturing capabilities of models, emphasizing dependencies between words or phrases beyond conventional n-gram features.

Empirical evaluations demonstrated that these DNN-based methodologies achieve superior performance over existing state-of-the-art methods, such as those proposed by Davidson et al., Gamback et al., and Park et al., in identifying hate speech. Notably, the proposed CNN+sCNN method showed up to 8 percentage points improvement in macro F1 score, particularly excelling in scenarios with scarce training data for hate classes.

Methodological Innovations

The study’s strength lies in its innovative methodological approach to modeling hate speech detection tasks. By incorporating skipped CNNs, the method introduces flexibility in feature extraction, effectively leveraging dependencies between non-contiguous words. This is contrasted with conventional CNN layers in the baseline models, which primarily focused on contiguous word sequences.

Another notable contribution is the data preprocessing step, which normalizes Twitter-specific idiosyncrasies to mitigate noise in textual data. This preprocessing, combined with the use of pre-trained word embeddings (Word2Vec, GloVe, and Twitter-specific embeddings), creates a more robust feature space for the neural models to operate within.

Implications and Future Research Directions

The implications of this research are twofold. Practically, the advancement of more accurate detection models aids in the mitigation of hate speech on platforms like Twitter, addressing both user safety and compliance with regulatory demands. Theoretically, this work encourages future research to account for the long-tail distribution in semantic analysis tasks, advocating for models that emphasize subtle semantic cues within imbalanced datasets.

Further research could explore additional contextual features beyond text, such as user metadata or network-based features, to enrich the semantic understanding of tweets. Investigations into transfer learning may also provide avenues to leverage feature similarities across various hate classes, potentially alleviating some data scarcity issues. Lastly, adapting these methods to other short-text platforms could uncover domain-specific challenges and solutions.

In summary, Zhang and Luo's work exemplifies a significant step forward in the domain of hate speech detection, proposing sophisticated neural architectures to address inherently difficult classification problems within unbalanced social media datasets. Their contributions highlight the necessity of adapting machine learning approaches to better capture the complexities of human language as exhibited in digital communications.

Markdown Report Issue