Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 44 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 13 tok/s Pro
GPT-5 High 15 tok/s Pro
GPT-4o 86 tok/s Pro
Kimi K2 208 tok/s Pro
GPT OSS 120B 447 tok/s Pro
Claude Sonnet 4 36 tok/s Pro
2000 character limit reached

GOAT-Bench: Safety Insights to Large Multimodal Models through Meme-Based Social Abuse (2401.01523v4)

Published 3 Jan 2024 in cs.CL and cs.AI

Abstract: The exponential growth of social media has profoundly transformed how information is created, disseminated, and absorbed, exceeding any precedent in the digital age. Regrettably, this explosion has also spawned a significant increase in the online abuse of memes. Evaluating the negative impact of memes is notably challenging, owing to their often subtle and implicit meanings, which are not directly conveyed through the overt text and image. In light of this, large multimodal models (LMMs) have emerged as a focal point of interest due to their remarkable capabilities in handling diverse multimodal tasks. In response to this development, our paper aims to thoroughly examine the capacity of various LMMs (e.g., GPT-4o) to discern and respond to the nuanced aspects of social abuse manifested in memes. We introduce the comprehensive meme benchmark, GOAT-Bench, comprising over 6K varied memes encapsulating themes such as implicit hate speech, sexism, and cyberbullying, etc. Utilizing GOAT-Bench, we delve into the ability of LMMs to accurately assess hatefulness, misogyny, offensiveness, sarcasm, and harmful content. Our extensive experiments across a range of LMMs reveal that current models still exhibit a deficiency in safety awareness, showing insensitivity to various forms of implicit abuse. We posit that this shortfall represents a critical impediment to the realization of safe artificial intelligence. The GOAT-Bench and accompanying resources are publicly accessible at https://goatlmm.github.io/, contributing to ongoing research in this vital field.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (65)
  1. Nocaps: Novel object captioning at scale. In Proceedings of the IEEE/CVF international conference on computer vision, pages 8948–8957.
  2. Flamingo: a visual language model for few-shot learning. In Advances in Neural Information Processing Systems.
  3. Predicting anti-asian hateful users on twitter during covid-19. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 4655–4666.
  4. Openflamingo: An open-source framework for training large autoregressive vision-language models. arXiv preprint arXiv:2308.01390.
  5. Qwen-vl: A frontier large vision-language model with versatile abilities. arXiv preprint arXiv:2308.12966.
  6. Introducing our multimodal models.
  7. Gpt-neox-20b: An open-source autoregressive language model.
  8. Language models are few-shot learners. In Proceedings of the 34th International Conference on Neural Information Processing Systems, pages 1877–1901.
  9. Multi-modal sarcasm detection in twitter with hierarchical fusion model. In Proceedings of the 57th annual meeting of the association for computational linguistics, pages 2506–2515.
  10. All-in-one: A deep attentive multi-task learning framework for humour, sarcasm, offensive, motivation, and sentiment on memes. In Proceedings of the 1st conference of the Asia-Pacific chapter of the association for computational linguistics and the 10th international joint conference on natural language processing, pages 281–290.
  11. Minigpt-v2: large language model as a unified interface for vision-language multi-task learning. arXiv preprint arXiv:2310.09478.
  12. Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality. See https://vicuna. lmsys. org (accessed 14 April 2023).
  13. Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311.
  14. Instructblip: Towards general-purpose vision-language models with instruction tuning. ArXiv, abs/2305.06500.
  15. Old jokes, new media–online sexism and constructions of gender in internet memes. Feminism & psychology, 28(1):109–127.
  16. Palm-e: An embodied multimodal language model. arXiv preprint arXiv:2303.03378.
  17. Semeval-2022 task 5: Multimedia automatic misogyny identification. In Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022), pages 533–549.
  18. Mme: A comprehensive evaluation benchmark for multimodal large language models. arXiv preprint arXiv:2306.13394.
  19. Multimodal-gpt: A vision and language model for dialogue with humans. arXiv preprint arXiv:2305.04790.
  20. Making the v in vqa matter: Elevating the role of image understanding in visual question answering. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 6904–6913.
  21. Lora: Low-rank adaptation of large language models. In International Conference on Learning Representations.
  22. Is chatgpt better than human annotators? potential and limitations of chatgpt in explaining implicit hate speech. In Companion Proceedings of the ACM Web Conference 2023, pages 294–297.
  23. Drew A Hudson and Christopher D Manning. 2019. Gqa: A new dataset for real-world visual reasoning and compositional question answering. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6700–6709.
  24. The hateful memes challenge: detecting hate speech in multimodal memes. In Proceedings of the 34th International Conference on Neural Information Processing Systems, pages 2611–2624.
  25. Large language models are zero-shot reasoners. In Advances in Neural Information Processing Systems.
  26. Seed-bench: Benchmarking multimodal llms with generative comprehension. arXiv preprint arXiv:2307.16125.
  27. Lavis: A library for language-vision intelligence. arXiv preprint arXiv:2209.09019.
  28. Self-alignment with instruction backtranslation.
  29. Beneath the surface: Unveiling harmful memes with multimodal reasoning distilled from large language models. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 9114–9128.
  30. Zero-shot rumor detection with propagation structure via prompt learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 5213–5221.
  31. Visual instruction tuning. arXiv preprint arXiv:2304.08485.
  32. Mmbench: Is your multi-modal model an all-around player? arXiv preprint arXiv:2307.06281.
  33. Learn to explain: Multimodal reasoning via thought chains for science question answering. Advances in Neural Information Processing Systems, 35:2507–2521.
  34. Wizardmath: Empowering mathematical reasoning for large language models via reinforced evol-instruct.
  35. Wizardcoder: Empowering code large language models with evol-instruct. arXiv preprint arXiv:2306.08568.
  36. Orca: Progressive learning from complex explanation traces of gpt-4.
  37. OpenAI. 2023. Gpt-4 technical report. ArXiv, abs/2303.08774.
  38. Training language models to follow instructions with human feedback. In Advances in Neural Information Processing Systems.
  39. Detecting harmful memes and their targets. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 2783–2796.
  40. Momenta: A multimodal framework for detecting harmful memes and their targets. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 4439–4455.
  41. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR.
  42. Improving language understanding by generative pre-training.
  43. Scaling language models: Methods, analysis & insights from training gopher.
  44. Detecting and understanding harmful memes: A survey. In Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, pages 5597–5606.
  45. Multimodal meme dataset (multioff) for identifying offensive content in image and text. In Proceedings of the second workshop on trolling, aggression and cyberbullying, pages 32–41.
  46. Ul2: Unifying language learning paradigms.
  47. Gemini: A family of highly capable multimodal models. arXiv preprint arXiv:2312.11805.
  48. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
  49. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
  50. Cogvlm: Visual expert for pretrained language models. arXiv preprint arXiv:2311.03079.
  51. Self-instruct: Aligning language models with self-generated instructions.
  52. Finetuned language models are zero-shot learners. In International Conference on Learning Representations.
  53. Chain-of-thought prompting elicits reasoning in large language models. In Advances in Neural Information Processing Systems.
  54. Zero-shot information extraction via chatting with chatgpt. arXiv preprint arXiv:2302.10205.
  55. Wizardlm: Empowering large language models to follow complex instructions.
  56. The dawn of lmms: Preliminary explorations with gpt-4v (ision). arXiv preprint arXiv:2309.17421, 9(1).
  57. mplug-owl: Modularization empowers large language models with multimodality. arXiv preprint arXiv:2304.14178.
  58. Lamm: Language-assisted multi-modal instruction-tuning dataset, framework, and benchmark. arXiv preprint arXiv:2306.06687.
  59. Mm-vet: Evaluating large multimodal models for integrated capabilities. arXiv preprint arXiv:2308.02490.
  60. From recognition to cognition: Visual commonsense reasoning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6720–6731.
  61. Glm-130b: An open bilingual pre-trained model.
  62. Eline Zenner and Dirk Geeraerts. 2018. One does not simply process memes: Image macros as multimodal constructions. Cultures and traditions of wordplay and wordplay research, pages 167–194.
  63. Opt: Open pre-trained transformer language models.
  64. Lima: Less is more for alignment.
  65. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:2304.10592.
Citations (19)

Summary

  • The paper introduces GOAT-Bench, a comprehensive dataset of over 6,000 memes to evaluate LMMs on detecting nuanced social abuse.
  • The evaluation shows that leading models, including GPT-4V, struggle with fine-grained tasks such as misogyny and sarcasm detection.
  • Self-alignment training methods improve LMM interpretability and performance, highlighting a need for enhanced safety in ethical AI development.

GOAT-Bench: Safety Insights to Large Multimodal Models through Meme-Based Social Abuse

The paper "GOAT-Bench: Safety Insights to Large Multimodal Models through Meme-Based Social Abuse" presents a benchmark for evaluating large multimodal models (LMMs) in the context of social abuse through memes. The authors introduce GOAT-Bench, a comprehensive dataset of over 6,000 memes, designed to probe the capabilities of LMMs in recognizing subtle forms of abuse encoded in multimodal content. This paper provides key insights into the limitations and potential of current LMMs in this socially critical domain.

Introduction

GOAT-Bench is motivated by the rapid proliferation of memes as vehicles of online social abuse, exploiting the combination of text and imagery to encode harmful content. As LMMs gain prominence, their ability to handle nuanced interpretations of multimodal data becomes vital. The benchmark comprises tasks such as detecting hatefulness, misogyny, offensiveness, sarcasm, and harmfulness in memes. These tasks challenge LMMs to discern complex implicit content that often escapes straightforward textual or visual analysis.

The authors assess a variety of state-of-the-art LMMs on the benchmark, revealing critical gaps in their safety awareness and sensitivity to abuse. This analysis indicates that while LMMs have advanced significantly, they remain inadequate in addressing the intricacies of meme-based social abuse. This inadequacy highlights the urgent need for improved model alignment with human values to mitigate the risk of unintentional harm by AI systems. Figure 1

Figure 1: Performance on our GOAT-Bench of a broad range of representative LMMs, like CogVLM, InstructBLIP, LLaVA-1.5, MiniGPT-4, Qwen-VL, and GPT-4V(ision). GPT-4V achieves the best overall performance from five different perspectives.

Methodology and Model Evaluation

The GOAT Benchmark

GOAT-Bench consists of carefully curated meme datasets sourced from prior literature and tailored to reflect the nuanced difficulties of multimodal abuse detection. It encapsulates five tasks, each targeting a specific facet of social abuse — suggesting the variety and complexity of challenges LMMs must navigate. Figure 2

Figure 2: GOAT-Bench is a comprehensive dataset that tackles the five interwoven meme tasks.

The tasks are defined as follows:

  1. Hatefulness: Memes targeting groups or individuals with dehumanizing language.
  2. Misogyny: Content that objectifies or discriminates against women.
  3. Offensiveness: Memes intending to provoke or offend without targeting specific groups.
  4. Sarcasm: Usage of incongruity between text and imagery to veil insult.
  5. Harmfulness: Broadly encompasses elements of abuse causing societal harm.

Experimental Setup

The paper scrutinizes 11 cutting-edge LMMs, comparing their performance using zero-shot, chain-of-thought (CoT) prompting, and in-context learning strategies. These models include both proprietary (e.g., GPT-4V) and open-source frameworks (e.g., MiniGPT-4, Qwen-VL).

Results demonstrate variation in performance across tasks, with GPT-4V generally leading in terms of alignment and accuracy. However, even the best models exhibit only moderate success, emphasizing the complexity and implicit nature of understanding meme-based social abuse. Figure 3

Figure 3: The comparison among the overall macro-averaged F1 scores (\%) of different LMMs with CoT prompts on the GOAT-Bench across different tasks.

Key Findings

The paper reports that current LMMs achieve below-optimal performance on meme tasks. In fine-grained safety tasks such as detecting misogyny and sarcasm, models struggle substantially. CoT prompts improve some models' performances, but the gains are inconsistent, indicating the complexity of modeling such indirect content.

The proposed self-alignment method aims to enhance meme detection by instructing models to generate simple rationales during training. This technique improves LMMs' capability to handle tasks, especially in reducing human oversight during supervised fine-tuning. Results indicate that self-alignment enhances interpretability and performance across multiple meme-related tasks.

Case Studies

A detailed analysis of model predictions reveals substantial challenges in nuanced understanding. Even top-performing models like GPT-4V can misinterpret implicit connections, leading to erroneous conclusions. Case studies illustrate instances of both successes and failures in meme analysis, underlining areas for model improvement. Figure 4

Figure 4: Hateful example of wrongly predicted memes by GPT-4V with the explanation.

Figure 5

Figure 5: Misogynistic example of correctly predicted memes by GPT-4V with the explanation.

Conclusion

GOAT-Bench serves as a rigorous testing ground for evaluating the social safety capabilities of LMMs, revealing significant deficiencies in their ability to detect multimodal social abuse. Despite advancements, LMMs require enhanced training methodologies and alignment strategies to mitigate the risk of perpetuating harm through their outputs.

Future work can focus on advancing model architectures and fine-tuning strategies that emphasize ethical AI development, potentially informing policy and moderation strategies for dealing with online abuse encoded through memes.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Github Logo Streamline Icon: https://streamlinehq.com

GitHub

X Twitter Logo Streamline Icon: https://streamlinehq.com