Measuring the Reliability of Hate Speech Annotations: The Case of the European Refugee Crisis (1701.08118v1)

Published 27 Jan 2017 in cs.CL

Abstract: Some users of social media are spreading racist, sexist, and otherwise hateful content. For the purpose of training a hate speech detection system, the reliability of the annotations is crucial, but there is no universally agreed-upon definition. We collected potentially hateful messages and asked two groups of internet users to determine whether they were hate speech or not, whether they should be banned or not and to rate their degree of offensiveness. One of the groups was shown a definition prior to completing the survey. We aimed to assess whether hate speech can be annotated reliably, and the extent to which existing definitions are in accordance with subjective ratings. Our results indicate that showing users a definition caused them to partially align their own opinion with the definition but did not improve reliability, which was very low overall. We conclude that the presence of hate speech should perhaps not be considered a binary yes-or-no decision, and raters need more detailed instructions for the annotation.

Citations (397)

View on Semantic Scholar

Summary

The paper demonstrates that annotator consistency is low (Krippendorff's alpha ranging from 0.18 to 0.29), questioning the effectiveness of binary classification methods.
The study shows that providing a hate speech definition aligns user opinions (r = 0.895) but does not eliminate underlying ambiguities in annotation.
The research advocates shifting to a regression approach that measures degrees of hatefulness to improve automated detection systems.

Evaluating the Consistency of Hate Speech Annotations in the Context of the European Refugee Crisis

The paper "Measuring the Reliability of Hate Speech Annotations: The Case of the European Refugee Crisis" addresses a significant challenge in the field of computational social science—reliably annotating hate speech for the development of automated detection systems. The research focuses specifically on hate speech related to the European refugee crisis, an area that has not been extensively explored, particularly within the German linguistic context.

Research Motivation and Methodology

The paper acknowledges the rising concern of hate speech proliferation on social media platforms and the corresponding societal and legislative interest in mitigating its spread. Automatic classification methods for hate speech rely heavily on accurately annotated datasets. Hence, the reliability of such annotations directly impacts the potential efficacy of machine learning classifiers. The authors compiled a novel German-language corpus related to the refugee crisis, sourced from Twitter posts using specific hashtags as proxies for potentially hateful content.

The authors implemented two distinct annotation strategies with internet users to evaluate the reliability of hate speech annotations. One cohort was provided with a specific hate speech definition, while the other was not, allowing for the assessment of how prior definitions might influence user perceptions and annotation consistency.

Key Findings

The findings reveal a notably low inter-rater reliability (Krippendorff's alpha ranging from .18 to .29), indicating substantial variability in how hate speech is identified by different annotators, regardless of the provision of a preliminary definition. This underscores the inherent subjectivity present in hate speech detection and the ambiguous nature of existing definitions in capturing the scope of what constitutes hate speech.

Additionally, the results showed that introducing a hate speech definition did not universally enhance annotation reliability. There was a strong correlation ( $r = .895, p < .0001$ ) between the subjective annotations of both groups, even though there was significant inconsistency, suggesting that such a definition aligns user opinions to a primary construct but does not resolve the ambiguities present in the data.

Implications and Future Directions

The paper indicates that relying on binary yes/no classification for hate speech may not be optimal due to the nuanced nature of hate speech perception across individual backgrounds and cultural contexts. Instead, treating hate speech detection as a regression problem where the degrees of hatefulness could offer a more scalable and inclusive detection model. By developing more refined and culturally contextual annotation guidelines, researchers can potentially increase the reliability of annotated datasets.

The research also points towards a future in which the mechanisms behind hate speech dissemination and the psychological triggers for spreading such content are better understood. This understanding could be crucial not only for constructing more effective detection systems but also for developing interventions that mitigate the harms of hate speech at both individual and societal levels.

Conclusion

In conclusion, the paper contributes to the ongoing discourse on the complexity of annotating hate speech and its ramifications for automated detection systems. Given the low consistency found in user annotations, the research importantly calls for improved, detailed guidelines and suggests a shift in modeling hate speech detection from binary classification to a more nuanced approach, focusing on the degree of hatefulness. This work forms a basis for future research endeavors aimed at enhancing the understanding and automatic identification of hate speech in varied social and linguistic contexts.

PDF Markdown

Related Papers

YouTube

Show All Videos