Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 49 tok/s
Gemini 2.5 Pro 53 tok/s Pro
GPT-5 Medium 19 tok/s Pro
GPT-5 High 16 tok/s Pro
GPT-4o 103 tok/s Pro
Kimi K2 172 tok/s Pro
GPT OSS 120B 472 tok/s Pro
Claude Sonnet 4 39 tok/s Pro
2000 character limit reached

Meme-ingful Analysis: Enhanced Understanding of Cyberbullying in Memes Through Multimodal Explanations (2401.09899v1)

Published 18 Jan 2024 in cs.CL

Abstract: Internet memes have gained significant influence in communicating political, psychological, and sociocultural ideas. While memes are often humorous, there has been a rise in the use of memes for trolling and cyberbullying. Although a wide variety of effective deep learning-based models have been developed for detecting offensive multimodal memes, only a few works have been done on explainability aspect. Recent laws like "right to explanations" of General Data Protection Regulation, have spurred research in developing interpretable models rather than only focusing on performance. Motivated by this, we introduce {\em MultiBully-Ex}, the first benchmark dataset for multimodal explanation from code-mixed cyberbullying memes. Here, both visual and textual modalities are highlighted to explain why a given meme is cyberbullying. A Contrastive Language-Image Pretraining (CLIP) projection-based multimodal shared-private multitask approach has been proposed for visual and textual explanation of a meme. Experimental results demonstrate that training with multimodal explanations improves performance in generating textual justifications and more accurately identifying the visual evidence supporting a decision with reliable performance improvements.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (45)
  1. Sweta Agrawal and Amit Awekar. 2018. Deep learning for detecting cyberbullying across multiple social media platforms. In European conference on information retrieval, pages 141–153. Springer.
  2. Layer normalization. arXiv preprint arXiv:1607.06450.
  3. Curriculum learning. In Proceedings of the 26th annual international conference on machine learning, pages 41–48.
  4. Deephate: Hate speech detection via multi-faceted text representations. In 12th ACM conference on web science, pages 11–20.
  5. Cyberbullying on social networking sites: the crime opportunity and affordance perspectives. Journal of Management Information Systems, 36(2):574–609.
  6. Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587.
  7. Experts and machines against bullies: A hybrid approach to detect cyberbullies. In Canadian conference on artificial intelligence, pages 275–281. Springer.
  8. Lee R Dice. 1945. Measures of the amount of ecologic association between species. Ecology, 26(3):297–302.
  9. Joseph L Fleiss. 1971. Measuring nominal scale agreement among many raters. Psychological bulletin, 76(5):378.
  10. Benito García-Valero. 2020. Borreguero zuloaga, m. and vitacolonna, l.(eds.), the legacy of jános s. petőfi. text linguistics, literary theory and semiotics. Journal of Literary Semantics, 49(1):61–64.
  11. Exploring hate speech detection in multimodal publications. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 1470–1478.
  12. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778.
  13. On explaining multimodal hateful meme detection models. In WWW ’22: The ACM Web Conference 2022, Virtual Event, Lyon, France, April 25 - 29, 2022, pages 3651–3655. ACM.
  14. Searching for mobilenetv3. In Proceedings of the IEEE/CVF international conference on computer vision, pages 1314–1324.
  15. Paul Jaccard. 1901. Distribution de la flore alpine dans le bassin des dranses et dans quelques régions voisines. Bull Soc Vaudoise Sci Nat, 37:241–272.
  16. Combining vision and language representations for patch-based identification of lexico-semantic relations. In Proceedings of the 30th ACM International Conference on Multimedia, pages 4406–4415.
  17. Deephateexplainer: Explainable hate speech detection in under-resourced bengali language. In 2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA), pages 1–10. IEEE.
  18. The hateful memes challenge: Detecting hate speech in multimodal memes. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual.
  19. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461.
  20. Chin-Yew Lin and Franz Josef Och. 2004. Automatic evaluation of machine translation quality using longest common subsequence and skip-bigram statistics. In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04), pages 605–612.
  21. Recurrent neural network for text classification with multi-task learning. arXiv preprint arXiv:1605.05101.
  22. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3431–3440.
  23. Visual attention model for name tagging in multimodal social media. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1990–1999.
  24. Scott M Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30.
  25. A multitask framework for sentiment, emotion and sarcasm aware cyberbullying detection from multi-modal code-mixed memes. In SIGIR ’22: The 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, Madrid, Spain, July 11 - 15, 2022, pages 1739–1749. ACM.
  26. A multitask framework for sentiment, emotion and sarcasm aware cyberbullying detection from multi-modal code-mixed memes.
  27. Hatexplain: A benchmark dataset for explainable hate speech detection. arXiv preprint arXiv:2012.10289.
  28. Carol Myers-Scotton. 1997. Duelling languages: Grammatical structure in codeswitching. Oxford University Press.
  29. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pages 311–318.
  30. Sayanta Paul and Sriparna Saha. 2020. Cyberbert: Bert for cyberbullying identification. Multimedia Systems, pages 1–8.
  31. MOMENTA: A multimodal framework for detecting harmful memes and their targets. In Findings of the Association for Computational Linguistics: EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 16-20 November, 2021, pages 4439–4455. Association for Computational Linguistics.
  32. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning, pages 8748–8763. PMLR.
  33. Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res., 21(140):1–67.
  34. " why should i trust you?" explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144.
  35. Estimating code-switching on twitter with a novel generalized word-level language detection technique. In Proceedings of the 55th annual meeting of the association for computational linguistics (volume 1: long papers), pages 1971–1982.
  36. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention, pages 234–241. Springer.
  37. Lin CY ROUGE. 2004. A package for automatic evaluation of summaries. In Proceedings of Workshop on Text Summarization of ACL, Spain.
  38. Cyberbullying: Its nature and impact in secondary school pupils. Journal of child psychology and psychiatry, 49(4):376–385.
  39. Longitudinal risk factors for cyberbullying in adolescence. Journal of community & applied social psychology, 23(1):52–67.
  40. Multimodal meme dataset (multioff) for identifying offensive content in image and text. In Proceedings of the Second Workshop on Trolling, Aggression and Cyberbullying, pages 32–41.
  41. Multimodal transformer for unaligned multimodal language sequences. In Proceedings of the conference. Association for Computational Linguistics. Meeting, volume 2019, page 6558. NIH Public Access.
  42. Examining characteristics and associated distress related to internet harassment: findings from the second youth internet safety survey. Pediatrics, 118(4):e1169–e1177.
  43. Vision guided generative pre-trained language models for multimodal abstractive summarization. arXiv preprint arXiv:2109.02401.
  44. Using “annotator rationales” to improve machine learning for text categorization. In Human language technologies 2007: The conference of the North American chapter of the association for computational linguistics; proceedings of the main conference, pages 260–267.
  45. Adaptive co-attention network for named entity recognition in tweets. In Thirty-Second AAAI Conference on Artificial Intelligence.
Citations (4)

Summary

  • The paper introduces the MultiBully-Ex dataset and a novel CLIP-based multimodal shared-private multitask model for detecting cyberbullying in memes.
  • It demonstrates significant improvements over traditional methods with enhanced performance metrics such as ROUGE, BLEU, and Dice Coefficient.
  • The methodology underscores the importance of integrating both textual and visual cues to provide transparent and accurate explanations of harmful content.

Meme-ingful Analysis: Enhanced Understanding of Cyberbullying in Memes Through Multimodal Explanations

Introduction

The proliferation of internet memes has significantly impacted the dissemination of socio-cultural and political ideas, with both positive and negative effects. "Meme-ingful Analysis" explores the darker side, particularly the rise of cyberbullying through memes, an area rich with multimodal content combining both image and text. The paper introduces the MultiBully-Ex dataset and proposes a novel approach using a Contrastive Language-Image Pretraining (CLIP) methodology to tackle the issue. Figure 1

Figure 1: Cyberbullying Explanation in memes. Here the aim is to highlight both the image and text as an explanation of why the given meme is a bully.

Multimodal Explanation in Memes

The paper stresses the importance of understanding memes not just as humorous content but as potential carriers of harmful cyberbullying messages. Traditional cyberbullying detection has largely focused on text; however, memes require a complex analysis that considers both the visual and textual elements to offer valid explanations. The multi-modal nature of memes demands a sophisticated approach that the authors address with a dataset and method specifically tailored for code-mixed languages.

MultiBully-Ex Dataset

The dataset developed, MultiBully-Ex, is the first benchmark to offer multimodal explanations for code-mixed cyberbullying memes. An exhaustive manual annotation process ensures high-quality data, with rationales provided for both textual and visual cues. Figure 2

Figure 2: CLIP projection-based (CP) multimodal shared-private multitask architecture.

Methodology

The methodology revolves around a CLIP projection-based multimodal shared-private multitask model. This approach leverages shared layers across tasks for both textual and visual classification, while task-specific private layers handle individual challenges like text segmentation and image feature extraction.

CLIP Projection-Based Cross-Modal Neck

The CLIP Projection-Based Cross-Modal Neck serves as a bridge between textual and visual modalities. By employing modality-specific gating mechanisms, it tempers the interplay between image and text, feeding both into task-specific sub-networks. Figure 3

Figure 3: Human annotation vs. proposed model's visual and textual explanations; Green highlights indicate an agreement between the human annotator and the model. Red highlighted tokens are predicted by models, not by human annotators.

Results and Discussions

The shared-private architecture demonstrated marked improvements over traditional single-task approaches by enabling models to draw on both textual and visual cues more effectively. Metrics like ROUGE, BLEU, and Dice Coefficient reflected significant performance enhancements. In particular, models that employed the multi-head attention-based fusion mechanism outperformed simpler dot-product attention methods, clearly benefiting from a more sophisticated handling of multimodal data.

Implications and Future Work

This research sets the stage for comprehensive multimodal cyberbullying detection in code-mixed contexts, stressing the need for transparency in AI models through explainability. Future work could extend this framework to other languages and explore understanding underlying stereotypes and implicit content in memes.

Conclusion

"Meme-ingful Analysis" offers a critical advancement in understanding and explaining cyberbullying in memes. By focusing on both visual and textual elements, and providing a robust dataset and sophisticated modeling approach, the research effectively addresses the complexity inherent in meme-based cyberbullying. Future expansions will further enhance its applicability across different cultural contexts and languages.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com