Emergent Mind

Abstract

In the evolving landscape of online communication, hate speech detection remains a formidable challenge, further compounded by the diversity of digital platforms. This study investigates the effectiveness and adaptability of pre-trained and fine-tuned LLMs in identifying hate speech, to address two central questions: (1) To what extent does the model performance depend on the fine-tuning and training parameters?, (2) To what extent do models generalize to cross-domain hate speech detection? and (3) What are the specific features of the datasets or models that influence the generalization potential? The experiment shows that LLMs offer a huge advantage over the state-of-the-art even without pretraining. To answer (1) we analyze 36 in-domain classifiers comprising LLaMA, Vicuna, and their variations in pre-trained and fine-tuned states across nine publicly available datasets that span a wide range of platforms and discussion forums. To answer (2), we assessed the performance of 288 out-of-domain classifiers for a given end-domain dataset. In answer to (3), ordinary least squares analyses suggest that the advantage of training with fine-grained hate speech labels is greater for smaller training datasets but washed away with the increase in dataset size. We conclude with a vision for the future of hate speech detection, emphasizing cross-domain generalizability and appropriate benchmarking practices.

We're not able to analyze this paper right now due to high demand.

Please check back later (sorry!).

Generate a summary of this paper on our Pro plan:

We ran into a problem analyzing this paper.

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.