Predicting the Type and Target of Offensive Posts in Social Media

Published 25 Feb 2019 in cs.CL | (1902.09666v2)

Abstract: As offensive content has become pervasive in social media, there has been much research in identifying potentially offensive messages. However, previous work on this topic did not consider the problem as a whole, but rather focused on detecting very specific types of offensive content, e.g., hate speech, cyberbulling, or cyber-aggression. In contrast, here we target several different kinds of offensive content. In particular, we model the task hierarchically, identifying the type and the target of offensive messages in social media. For this purpose, we complied the Offensive Language Identification Dataset (OLID), a new dataset with tweets annotated for offensive content using a fine-grained three-layer annotation scheme, which we make publicly available. We discuss the main similarities and differences between OLID and pre-existing datasets for hate speech identification, aggression detection, and similar tasks. We further experiment with and we compare the performance of different machine learning models on OLID.

Abstract PDF Upgrade to Chat

Authors (6)

Citations (739)

View on Semantic Scholar

Summary

The paper introduces a hierarchical annotation schema that categorizes offensive posts by detecting both type and target.
The study evaluates machine learning models, with CNN achieving the highest macro-F1 scores in offensive language detection and targeting.
The OLID dataset and findings offer practical insights for improving content moderation strategies on social media platforms.

Authors: Marcos Zampieri, Shervin Malmasi, Preslav Nakov, Sara Rosenthal, Noura Farra, Ritesh Kumar

The study "Predicting the Type and Target of Offensive Posts in Social Media" addresses the complex issue of recognizing various forms of offensive content in online interactions, with a specific focus on social media platforms such as Twitter. This research overcomes the limitations of previous work that primarily focused on specific kinds of offensive language (e.g., hate speech, cyberbullying) by introducing a multi-faceted approach to offensive content detection. The authors propose a hierarchical model for classifying offensive posts, identifying both the type and target of the offense, thus providing a more comprehensive framework.

Hierarchical Annotation Schema

The authors introduce the Offensive Language Identification Dataset (OLID), annotated using a detailed three-level hierarchical schema:

Level A: Offensive Language Detection
- NOT (Not Offensive): Posts devoid of any offensive language or profanity.
- OFF (Offensive): Posts containing unacceptable language, either targeted or untargeted.
Level B: Categorization of Offensive Language
- TIN (Targeted Insult): Posts containing specific threats or insults directed at an individual, group, or entity.
- UNT (Untargeted): Posts with general profanity or swearing without a specific target.
Level C: Offensive Language Target Identification
- IND (Individual): Posts targeting specific individuals.
- GRP (Group): Posts aimed at groups based on characteristics such as ethnicity, gender, or religious beliefs.
- OTH (Other): Posts targeting entities other than individuals or groups, such as organizations or events.

This annotation framework allows for detailed categorization and differentiation of offensive content, providing significant practical utility for social media platforms in moderating and managing content.

Data Collection and Annotation

The dataset was compiled using Twitter API, targeting keywords typically associated with offensive language. The authors stratified the collection to ensure a balanced representation of political and non-political content, given the higher propensity for offensive language within political contexts. Notably, the annotation process employed crowdsourcing through Figure Eight, ensuring high-quality data through strict annotator selection and agreement protocols.

Key statistics include:

Training Set Size: 13,240 tweets
Test Set Size: 860 tweets
Distribution of Offensive Content: Approximately 30% offensive to 70% non-offensive

Experimental Evaluation

The performance of different machine learning models, including SVM, BiLSTM, and CNN, was evaluated on the OLID dataset. Here are the notable findings:

Offensive Language Detection (Level A):
- CNN achieved the highest macro-F1 score (0.80), outperforming the BiLSTM and SVM models.
Categorization of Offensive Language (Level B):
- CNN again showed superior performance with a macro-F1 score (0.69), particularly excelling in identifying targeted insults (TIN).
Offensive Language Target Identification (Level C):
- Despite challenges due to the heterogeneous nature of the OTH category, the CNN and BiLSTM models performed comparably, with macro-F1 scores indicating moderate success (0.47).

Implications and Future Directions

The hierarchical approach delineated in this research provides a robust framework for handling offensive language detection at multiple levels of granularity. Practically, OLID's schema and the associated machine learning baselines can enhance the moderation capabilities of social media platforms, enabling more nuanced and effective handling of offensive content.

Future research should further explore cross-corpus comparisons with other datasets on related tasks such as aggression and hate speech identification. Expanding OLID to include other languages while adhering to the structured hierarchical annotation can pave the way for more generalizable and internationally applicable models. The work opens avenues for refining offensive content detection mechanisms, contributing to the broader goal of maintaining healthier online discourse.

Markdown Report Issue