Collective moderation of hate, toxicity, and extremity in online discussions (2303.00357v4)

Published 1 Mar 2023 in cs.CY

Abstract: How can citizens address hate in online discourse? We analyze a large corpus of more than 130,000 discussions on Twitter over four years. With the help of human annotators, LLMs and machine learning classifiers, we identify different dimensions of discourse that might be related to the probability of hate speech in subsequent tweets. We use a matching approach and longitudinal statistical analyses to discern the effectiveness of different counter speech strategies on the micro-level (individual tweet pairs), meso-level (discussion trees) and macro-level (days) of discourse. We find that expressing simple opinions, not necessarily supported by facts, but without insults, relates to the least hate in subsequent discussions. Sarcasm can be helpful as well, in particular in the presence of organized extreme groups. Mentioning either outgroups or ingroups is typically related to a deterioration of discourse. A pronounced emotional tone, either negative such as anger or fear, or positive such as enthusiasm and pride, also leads to worse discourse quality. We obtain similar results for other measures of quality of discourse beyond hate speech, including toxicity, extremity of speech, and the presence of extreme speakers. Going beyond one-shot analyses on smaller samples of discourse, our findings have implications for the successful management of online commons through collective civic moderation.

Citations (3)

View on Semantic Scholar