GOAT-Bench: Safety Insights to Large Multimodal Models through Meme-Based Social Abuse (2401.01523v4)

Published 3 Jan 2024 in cs.CL and cs.AI

Abstract: The exponential growth of social media has profoundly transformed how information is created, disseminated, and absorbed, exceeding any precedent in the digital age. Regrettably, this explosion has also spawned a significant increase in the online abuse of memes. Evaluating the negative impact of memes is notably challenging, owing to their often subtle and implicit meanings, which are not directly conveyed through the overt text and image. In light of this, large multimodal models (LMMs) have emerged as a focal point of interest due to their remarkable capabilities in handling diverse multimodal tasks. In response to this development, our paper aims to thoroughly examine the capacity of various LMMs (e.g., GPT-4o) to discern and respond to the nuanced aspects of social abuse manifested in memes. We introduce the comprehensive meme benchmark, GOAT-Bench, comprising over 6K varied memes encapsulating themes such as implicit hate speech, sexism, and cyberbullying, etc. Utilizing GOAT-Bench, we delve into the ability of LMMs to accurately assess hatefulness, misogyny, offensiveness, sarcasm, and harmful content. Our extensive experiments across a range of LMMs reveal that current models still exhibit a deficiency in safety awareness, showing insensitivity to various forms of implicit abuse. We posit that this shortfall represents a critical impediment to the realization of safe artificial intelligence. The GOAT-Bench and accompanying resources are publicly accessible at https://goatlmm.github.io/, contributing to ongoing research in this vital field.

References (65)

Citations (19)

View on Semantic Scholar

Summary

The paper introduces GOAT-Bench, a comprehensive dataset of over 6,000 memes to evaluate LMMs on detecting nuanced social abuse.
The evaluation shows that leading models, including GPT-4V, struggle with fine-grained tasks such as misogyny and sarcasm detection.
Self-alignment training methods improve LMM interpretability and performance, highlighting a need for enhanced safety in ethical AI development.

The paper "GOAT-Bench: Safety Insights to Large Multimodal Models through Meme-Based Social Abuse" presents a benchmark for evaluating large multimodal models (LMMs) in the context of social abuse through memes. The authors introduce GOAT-Bench, a comprehensive dataset of over 6,000 memes, designed to probe the capabilities of LMMs in recognizing subtle forms of abuse encoded in multimodal content. This paper provides key insights into the limitations and potential of current LMMs in this socially critical domain.

Introduction

GOAT-Bench is motivated by the rapid proliferation of memes as vehicles of online social abuse, exploiting the combination of text and imagery to encode harmful content. As LMMs gain prominence, their ability to handle nuanced interpretations of multimodal data becomes vital. The benchmark comprises tasks such as detecting hatefulness, misogyny, offensiveness, sarcasm, and harmfulness in memes. These tasks challenge LMMs to discern complex implicit content that often escapes straightforward textual or visual analysis.

The authors assess a variety of state-of-the-art LMMs on the benchmark, revealing critical gaps in their safety awareness and sensitivity to abuse. This analysis indicates that while LMMs have advanced significantly, they remain inadequate in addressing the intricacies of meme-based social abuse. This inadequacy highlights the urgent need for improved model alignment with human values to mitigate the risk of unintentional harm by AI systems.

Figure 1: Performance on our GOAT-Bench of a broad range of representative LMMs, like CogVLM, InstructBLIP, LLaVA-1.5, MiniGPT-4, Qwen-VL, and GPT-4V(ision). GPT-4V achieves the best overall performance from five different perspectives.

Methodology and Model Evaluation

The GOAT Benchmark

GOAT-Bench consists of carefully curated meme datasets sourced from prior literature and tailored to reflect the nuanced difficulties of multimodal abuse detection. It encapsulates five tasks, each targeting a specific facet of social abuse — suggesting the variety and complexity of challenges LMMs must navigate.

Figure 2: GOAT-Bench is a comprehensive dataset that tackles the five interwoven meme tasks.

The tasks are defined as follows:

Hatefulness: Memes targeting groups or individuals with dehumanizing language.
Misogyny: Content that objectifies or discriminates against women.
Offensiveness: Memes intending to provoke or offend without targeting specific groups.
Sarcasm: Usage of incongruity between text and imagery to veil insult.
Harmfulness: Broadly encompasses elements of abuse causing societal harm.

Experimental Setup

The paper scrutinizes 11 cutting-edge LMMs, comparing their performance using zero-shot, chain-of-thought (CoT) prompting, and in-context learning strategies. These models include both proprietary (e.g., GPT-4V) and open-source frameworks (e.g., MiniGPT-4, Qwen-VL).

Results demonstrate variation in performance across tasks, with GPT-4V generally leading in terms of alignment and accuracy. However, even the best models exhibit only moderate success, emphasizing the complexity and implicit nature of understanding meme-based social abuse.

Figure 3: The comparison among the overall macro-averaged F1 scores (\%) of different LMMs with CoT prompts on the GOAT-Bench across different tasks.

Key Findings

The paper reports that current LMMs achieve below-optimal performance on meme tasks. In fine-grained safety tasks such as detecting misogyny and sarcasm, models struggle substantially. CoT prompts improve some models' performances, but the gains are inconsistent, indicating the complexity of modeling such indirect content.

The proposed self-alignment method aims to enhance meme detection by instructing models to generate simple rationales during training. This technique improves LMMs' capability to handle tasks, especially in reducing human oversight during supervised fine-tuning. Results indicate that self-alignment enhances interpretability and performance across multiple meme-related tasks.

Case Studies

A detailed analysis of model predictions reveals substantial challenges in nuanced understanding. Even top-performing models like GPT-4V can misinterpret implicit connections, leading to erroneous conclusions. Case studies illustrate instances of both successes and failures in meme analysis, underlining areas for model improvement.

Figure 4: Hateful example of wrongly predicted memes by GPT-4V with the explanation.

Figure 5: Misogynistic example of correctly predicted memes by GPT-4V with the explanation.

Conclusion

GOAT-Bench serves as a rigorous testing ground for evaluating the social safety capabilities of LMMs, revealing significant deficiencies in their ability to detect multimodal social abuse. Despite advancements, LMMs require enhanced training methodologies and alignment strategies to mitigate the risk of perpetuating harm through their outputs.

Future work can focus on advancing model architectures and fine-tuning strategies that emphasize ethical AI development, potentially informing policy and moderation strategies for dealing with online abuse encoded through memes.