Papers
Topics
Authors
Recent
2000 character limit reached

Ensemble of pre-trained language models and data augmentation for hate speech detection from Arabic tweets (2407.02448v1)

Published 2 Jul 2024 in cs.CL and cs.AI

Abstract: Today, hate speech classification from Arabic tweets has drawn the attention of several researchers. Many systems and techniques have been developed to resolve this classification task. Nevertheless, two of the major challenges faced in this context are the limited performance and the problem of imbalanced data. In this study, we propose a novel approach that leverages ensemble learning and semi-supervised learning based on previously manually labeled. We conducted experiments on a benchmark dataset by classifying Arabic tweets into 5 distinct classes: non-hate, general hate, racial, religious, or sexism. Experimental results show that: (1) ensemble learning based on pre-trained LLMs outperforms existing related works; (2) Our proposed data augmentation improves the accuracy results of hate speech detection from Arabic tweets and outperforms existing related works. Our main contribution is the achievement of encouraging results in Arabic hate speech detection.

Summary

We haven't generated a summary for this paper yet.

Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.