Content-Localization based System for Analyzing Sentiment and Hate Behaviors in Low-Resource Dialectal Arabic: English to Levantine and Gulf (2312.03727v1)
Abstract: Even though online social movements can quickly become viral on social media, languages can be a barrier to timely monitoring and analyzing the underlying online social behaviors (OSB). This is especially true for under-resourced languages on social media like dialectal Arabic; the primary language used by Arabs on social media. Therefore, it is crucial to provide solutions to efficiently exploit resources from high-resourced languages to solve language-dependent OSB analysis in under-resourced languages. This paper proposes to localize content of resources in high-resourced languages into under-resourced Arabic dialects. Content localization goes beyond content translation that converts text from one language to another; content localization adapts culture, language nuances and regional preferences from one language to a specific language/dialect. Automating understanding of the natural and familiar day-to-day expressions in different regions, is the key to achieve a wider analysis of OSB especially for smart cities. In this paper, we utilize content-localization based neural machine translation to develop sentiment and hate classifiers for two low-resourced Arabic dialects: Levantine and Gulf. Not only this but we also leverage unsupervised learning to facilitate the analysis of sentiment and hate predictions by inferring hidden topics from the corresponding data and providing coherent interpretations of those topics in their native language/dialects. The experimental evaluations and proof-of-concept COVID-19 case study on real data have validated the effectiveness of our proposed system in precisely distinguishing sentiments and accurately identifying hate content in both Levantine and Gulf Arabic dialects. Our findings shed light on the importance of considering the unique nature of dialects within the same language and ignoring the dialectal aspect would lead to misleading analysis.
- Sentiment analysis on financial news headlines using training dataset augmentation. arXiv preprint arXiv:1707.09448, 2017.
- Detecting white supremacist hate speech using domain specific word embedding with deep learning and bert. arXiv e-prints, pages arXiv–2010, 2020.
- Sentiment analysis of twitter data for saudi universities. International Journal of Machine Learning and Computing, 10(1), 2020.
- Monitoring cyber sentihate social behavior during covid-19 pandemic in north america. IEEE Access, 2021.
- Towards storytelling by extracting social information from osn photo’s metadata. In Proceedings of the First International Workshop on Internet-Scale Multimedia Management, pages 15–20, 2014.
- Dst: days spent together using soft sensory information on osns—a case study on facebook. Soft Computing, 21(15):4227–4238, 2017.
- Light gradient boosting machine for general sentiment classification on short texts: a comparative evaluation. IEEE access, 8:101840–101858, 2020.
- A combined cnn and lstm model for arabic sentiment analysis. In Machine Learning and Knowledge Extraction: Second IFIP TC 5, TC 8/WG 8.4, 8.9, TC 12/WG 12.9 International Cross-Domain Conference, CD-MAKE 2018, Hamburg, Germany, August 27–30, 2018, Proceedings 2, pages 179–191. Springer, 2018.
- Detecting topic and sentiment dynamics due to covid-19 pandemic using social media. arXiv preprint arXiv:2007.02304, 2020.
- Hate detection in covid-19 tweets in the arab region using deep learning and topic modeling. Journal of Medical Internet Research, 2020.
- Automatic labeling of multinomial topic models. In Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 490–499, 2007.
- A hybrid rules and statistical method for arabic to english machine translation. In 2019 2nd International Conference on Computer Applications & Information Security (ICCAIS), pages 1–7. IEEE, 2019.
- Arabic machine translation: a survey. Artificial Intelligence Review, 42:549–572, 2014.
- Evaluation of arabic to english machine translation systems. In 2020 11th International Conference on Information and Communication Systems (ICICS), pages 185–190. IEEE, 2020.
- Neural machine translation for low-resource languages: A survey. ACM Computing Surveys, 55(11):1–37, 2023.
- Arabench: Benchmarking dialectal arabic-english machine translation. In Proceedings of the 28th International Conference on Computational Linguistics, pages 5094–5107, 2020.
- Machine translation of arabic dialects. In Proceedings of the 2012 conference of the north american chapter of the association for computational linguistics: Human language technologies, pages 49–59, 2012.
- Investigating code-mixed modern standard arabic-egyptian to english machine translation. arXiv preprint arXiv:2105.13573, 2021.
- A multidialectal parallel corpus of arabic. In LREC, pages 1240–1245, 2014.
- The madar arabic dialect corpus and lexicon. In Proceedings of the eleventh international conference on language resources and evaluation (LREC 2018), 2018.
- Development of a tv broadcasts speech recognition system for qatari arabic. In LREC, volume 14, pages 3057–3061, 2014.
- Hadry. French twitter sentiment analysis. https://kaggle.com/hbaflast/french-twitter-sentiment-analysis, 2020. Accessed: (August 15, 2023).
- Arsentd-lev: A multi-topic corpus for target-based sentiment analysis in arabic levantine tweets. arXiv preprint arXiv:1906.01830, 2019.
- Affectional ontology and multimedia dataset for sentiment analysis. In International Conference on Smart Multimedia, pages 15–28. Springer, 2018.
- T-hsab: A tunisian hate speech and abusive dataset. In International Conference on Arabic Language Processing, pages 251–263. Springer, 2019.
- From arabic sentiment analysis to sarcasm detection: The arsarcasm dataset. In Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive Language Detection, pages 32–39, 2020.
- Sana: A large scale multi-genre, multi-dialect lexicon for arabic subjectivity and sentiment analysis. In LREC, pages 1162–1169, 2014.
- Let-mi: an arabic levantine twitter dataset for misogynistic language. arXiv preprint arXiv:2103.10195, 2021.
- Maarten Grootendorst. Bertopic: Neural topic modeling with a class-based tf-idf procedure, 2022.
- Keyword and keyphrase extraction techniques: a literature review. International Journal of Computer Applications, 109(2), 2015.
- Textrank: Bringing order into text. In Proceedings of the 2004 conference on empirical methods in natural language processing, pages 404–411, 2004.
- Single document keyphrase extraction using neighborhood knowledge. In AAAI, volume 8, pages 855–860, 2008.
- Sgrank: Combining statistical and graphical methods to improve the state of the art in unsupervised keyphrase extraction. In Proceedings of the fourth joint conference on lexical and computational semantics, pages 117–126, 2015.
- Positionrank: An unsupervised approach to keyphrase extraction from scholarly documents. In Proceedings of the 55th annual meeting of the association for computational linguistics (volume 1: long papers), pages 1105–1115, 2017.
- Automatic labelling of topic models learned from twitter by summarisation. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 618–624, 2014.
- An online topic modeling framework with topics automatically labeled. In Proceedings of the 2019 Workshop on Widening NLP, pages 73–76, 2019.
- Automatic keyword extraction from individual documents. Text mining: applications and theory, 1:1–20, 2010.
- Osn-mdad: Machine translation dataset for arabic multi-dialectal conversations on online social media. arXiv preprint arXiv:2309.12137, 2023.
- Multilingual denoising pre-training for neural machine translation. Transactions of the Association for Computational Linguistics, 8:726–742, 2020.
- Adapt or get left behind: Domain adaptation through bert language model finetuning for aspect-target sentiment classification. arXiv preprint arXiv:1908.11860, 2019.
- Transformer-based feature fusion approach for multimodal visual sentiment recognition using tweets in the wild. IEEE Access, 2023.
- The interplay of variant, size, and task type in Arabic pre-trained language models. In Proceedings of the Sixth Arabic Natural Language Processing Workshop, Kyiv, Ukraine (Online), April 2021. Association for Computational Linguistics.
- SemEval-2013 task 2: Sentiment analysis in Twitter. In Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013), pages 312–320, Atlanta, Georgia, USA, June 2013. Association for Computational Linguistics.
- Semeval-2017 task 4: Sentiment analysis in twitter. In Proceedings of the 11th international workshop on semantic evaluation (SemEval-2017), pages 502–518, 2017.
- Marwan Al Omari. Oclar: logistic regression optimisation for arabic customers’ reviews. International Journal of Business Intelligence and Data Mining, 20(3):251–273, 2022.
- Customer sentiments toward saudi banks during the covid-19 pandemic. In Proceedings of the 5th International Conference on Natural Language and Speech Processing (ICNLSP 2022), pages 251–257, 2022.
- Application of support vector machine for arabic sentiment classification using twitter-based dataset. Journal of Information & Knowledge Management, 19(01):2040018, 2020.
- Semeval-2019 task 5: Multilingual detection of hate speech against immigrants and women in twitter. In 13th International Workshop on Semantic Evaluation, pages 54–63. Association for Computational Linguistics, 2019.
- Shorouq Zahra. Targeted topic modeling for levantine arabic, 2020.
- Sahar Aldhaheri. Sentiment analysis for saudi public opinion toward covid19 and quarantine. https://www.linkedin.com/pulse/sentiment-analysis-saudi-public-opinion-toward-sahar-aldhaheri/, 2020. Accessed: (August 23, 2023).
- Full-text or abstract? examining topic coherence scores using latent dirichlet allocation. In 2017 IEEE International conference on data science and advanced analytics (DSAA), pages 165–174. IEEE, 2017.
- A survey on sentiment analysis and opinion mining for social multimedia. Multimedia Tools and Applications, 78:6939–6967, 2019.
- Fatimah Alzamzami (3 papers)
- Abdulmotaleb El Saddik (49 papers)