Emergent Mind

Abstract

Social media platforms are being increasingly used by malicious actors to share unsafe content, such as images depicting sexual activity, cyberbullying, and self-harm. Consequently, major platforms use AI and human moderation to obfuscate such images to make them safer. Two critical needs for obfuscating unsafe images is that an accurate rationale for obfuscating image regions must be provided, and the sensitive regions should be obfuscated (\textit{e.g.} blurring) for users' safety. This process involves addressing two key problems: (1) the reason for obfuscating unsafe images demands the platform to provide an accurate rationale that must be grounded in unsafe image-specific attributes, and (2) the unsafe regions in the image must be minimally obfuscated while still depicting the safe regions. In this work, we address these key issues by first performing visual reasoning by designing a visual reasoning model (VLM) conditioned on pre-trained unsafe image classifiers to provide an accurate rationale grounded in unsafe image attributes, and then proposing a counterfactual explanation algorithm that minimally identifies and obfuscates unsafe regions for safe viewing, by first utilizing an unsafe image classifier attribution matrix to guide segmentation for a more optimal subregion segmentation followed by an informed greedy search to determine the minimum number of subregions required to modify the classifier's output based on attribution score. Extensive experiments on uncurated data from social networks emphasize the efficacy of our proposed method. We make our code available at: https://github.com/SecureAIAutonomyLab/ConditionalVLM

Overview

  • Introduction of an advanced Conditional Vision Language Model (ConditionalVLM) and Counterfactual Subobject Explanation (CSE) method for social media content moderation.

  • ConditionalVLM generates specific rationales for obfuscation in unsafe image content, considering the unique attributes of each category.

  • CSE algorithm identifies and obfuscates only the unsafe parts of images using a FullGrad-based model and informed greedy search strategy.

  • Performance of ConditionalVLM and CSE is tested on uncurated social network datasets, showing superior results in rationale generation and efficient obfuscation.

  • Framework's contributions include improved rationale accuracy, minimal obfuscation for investigations, and reduced harmful exposure for moderators and users.

Overview

The field of social media content moderation has seen a significant development with the introduction of an advanced visual reasoning model, known as Conditional Vision Language Model (ConditionalVLM), and a Counterfactual Subobject Explanation (CSE) method. This novel framework addresses the dual problem of providing clear rationales for obfuscating unsafe images and accurately pinpointing the segments necessary for obfuscation.

Visual Reasoning with ConditionalVLM

The ConditionalVLM is designed to address the issue of generating comprehensive and specific rationales for the obfuscation of images depicting sexual activity, cyberbullying, and self-harm. The VLM employs a strategy of conditioning on pre-trained unsafe image classifiers. The process factors in the attributes particular to each category, such as explicit gestures in cyberbullying or distinguishing marks on skin in sexually explicit content. This enables the VLM to generate explanations that are not only pertinent but also safeguard the integrity of context for future investigations or evidence collection.

Counterfactual Subobject Explanations for Obfuscation

The paper then explore the novel counterfactual explanation algorithm that smartly identifies and obfuscates only the unsafe aspects of an image. By leveraging a FullGrad-based model for calculating an attribution matrix to guide Bayesian superpixel segmentation, the method achieves a dynamic and efficient identification of the key regions. An informed greedy search is then undertaken to find the minimum subregions necessary to shift the classifier's decision, ensuring maximum retention of the safe parts of the image.

Experimental Efficacy

The paper extensively experiments with both components—the ConditionalVLM and the CSE—using uncurated datasets from social networks, demonstrating their efficiency. The ConditionalVLM achieves impressive performance, surpassing other state-of-the-art models in providing accurate rationales for content obfuscation. Concurrently, the CSE method shows a marked improvement in identifying the least number of subregions for modification while maintaining a high level of accuracy in segmenting only the unsafe portions for obfuscation purposes.

Contributions and Implications

The combined approach of ConditionalVLM and CSE has several notable contributions to the field: accurate rationale generation for image obfuscation grounded in unsafe attributes, minimal and seamless obfuscation to aid in investigations, and a substantial reduction in exposure to harmful content for moderators and law enforcement agents. The codebase for this work is also made available for public use, encouraging further research and development in the domain. The implications of this research are profound, safeguarding those who are on the frontlines of content moderation and also protecting vulnerable users from potential harmful exposure.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.