Image Safeguarding: Reasoning with Conditional Vision Language Model and Obfuscating Unsafe Content Counterfactually (2401.11035v1)

Published 19 Jan 2024 in cs.CV

Abstract: Social media platforms are being increasingly used by malicious actors to share unsafe content, such as images depicting sexual activity, cyberbullying, and self-harm. Consequently, major platforms use AI and human moderation to obfuscate such images to make them safer. Two critical needs for obfuscating unsafe images is that an accurate rationale for obfuscating image regions must be provided, and the sensitive regions should be obfuscated (\textit{e.g.} blurring) for users' safety. This process involves addressing two key problems: (1) the reason for obfuscating unsafe images demands the platform to provide an accurate rationale that must be grounded in unsafe image-specific attributes, and (2) the unsafe regions in the image must be minimally obfuscated while still depicting the safe regions. In this work, we address these key issues by first performing visual reasoning by designing a visual reasoning model (VLM) conditioned on pre-trained unsafe image classifiers to provide an accurate rationale grounded in unsafe image attributes, and then proposing a counterfactual explanation algorithm that minimally identifies and obfuscates unsafe regions for safe viewing, by first utilizing an unsafe image classifier attribution matrix to guide segmentation for a more optimal subregion segmentation followed by an informed greedy search to determine the minimum number of subregions required to modify the classifier's output based on attribution score. Extensive experiments on uncurated data from social networks emphasize the efficacy of our proposed method. We make our code available at: https://github.com/SecureAIAutonomyLab/ConditionalVLM

References (54)

Citations (3)

View on Semantic Scholar

Summary

The paper demonstrates a novel dual framework combining ConditionalVLM for rationale generation with a counterfactual method for precise image obfuscation.
It employs FullGrad-based attributions and Bayesian superpixel segmentation to identify minimal unsafe regions while retaining safe content.
Empirical results indicate superior performance over state-of-the-art models, enhancing content moderation efficacy in social media contexts.

Overview

The field of social media content moderation has seen a significant development with the introduction of an advanced visual reasoning model, known as Conditional Vision LLM (ConditionalVLM), and a Counterfactual Subobject Explanation (CSE) method. This novel framework addresses the dual problem of providing clear rationales for obfuscating unsafe images and accurately pinpointing the segments necessary for obfuscation.

Visual Reasoning with ConditionalVLM

The ConditionalVLM is designed to address the issue of generating comprehensive and specific rationales for the obfuscation of images depicting sexual activity, cyberbullying, and self-harm. The VLM employs a strategy of conditioning on pre-trained unsafe image classifiers. The process factors in the attributes particular to each category, such as explicit gestures in cyberbullying or distinguishing marks on skin in sexually explicit content. This enables the VLM to generate explanations that are not only pertinent but also safeguard the integrity of context for future investigations or evidence collection.

Counterfactual Subobject Explanations for Obfuscation

The paper then explores the novel counterfactual explanation algorithm that smartly identifies and obfuscates only the unsafe aspects of an image. By leveraging a FullGrad-based model for calculating an attribution matrix to guide Bayesian superpixel segmentation, the method achieves a dynamic and efficient identification of the key regions. An informed greedy search is then undertaken to find the minimum subregions necessary to shift the classifier's decision, ensuring maximum retention of the safe parts of the image.

Experimental Efficacy

The paper extensively experiments with both components—the ConditionalVLM and the CSE—using uncurated datasets from social networks, demonstrating their efficiency. The ConditionalVLM achieves impressive performance, surpassing other state-of-the-art models in providing accurate rationales for content obfuscation. Concurrently, the CSE method shows a marked improvement in identifying the least number of subregions for modification while maintaining a high level of accuracy in segmenting only the unsafe portions for obfuscation purposes.

Contributions and Implications

The combined approach of ConditionalVLM and CSE has several notable contributions to the field: accurate rationale generation for image obfuscation grounded in unsafe attributes, minimal and seamless obfuscation to aid in investigations, and a substantial reduction in exposure to harmful content for moderators and law enforcement agents. The codebase for this work is also made available for public use, encouraging further research and development in the domain. The implications of this research are profound, safeguarding those who are on the frontlines of content moderation and also protecting vulnerable users from potential harmful exposure.

PDF Markdown

Related Papers

GitHub

GitHub - SecureAIAutonomyLab/ConditionalVLM: CSE and ConditionalVLM Implementation

Tweets

https://twitter.com/ducha_aiki/status/1749750275982655664

https://twitter.com/VishwamitraNish/status/1749601086426497154