Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 147 tok/s
Gemini 2.5 Pro 52 tok/s Pro
GPT-5 Medium 27 tok/s Pro
GPT-5 High 30 tok/s Pro
GPT-4o 96 tok/s Pro
Kimi K2 188 tok/s Pro
GPT OSS 120B 398 tok/s Pro
Claude Sonnet 4.5 36 tok/s Pro
2000 character limit reached

Image Safeguarding: Reasoning with Conditional Vision Language Model and Obfuscating Unsafe Content Counterfactually (2401.11035v1)

Published 19 Jan 2024 in cs.CV

Abstract: Social media platforms are being increasingly used by malicious actors to share unsafe content, such as images depicting sexual activity, cyberbullying, and self-harm. Consequently, major platforms use AI and human moderation to obfuscate such images to make them safer. Two critical needs for obfuscating unsafe images is that an accurate rationale for obfuscating image regions must be provided, and the sensitive regions should be obfuscated (\textit{e.g.} blurring) for users' safety. This process involves addressing two key problems: (1) the reason for obfuscating unsafe images demands the platform to provide an accurate rationale that must be grounded in unsafe image-specific attributes, and (2) the unsafe regions in the image must be minimally obfuscated while still depicting the safe regions. In this work, we address these key issues by first performing visual reasoning by designing a visual reasoning model (VLM) conditioned on pre-trained unsafe image classifiers to provide an accurate rationale grounded in unsafe image attributes, and then proposing a counterfactual explanation algorithm that minimally identifies and obfuscates unsafe regions for safe viewing, by first utilizing an unsafe image classifier attribution matrix to guide segmentation for a more optimal subregion segmentation followed by an informed greedy search to determine the minimum number of subregions required to modify the classifier's output based on attribution score. Extensive experiments on uncurated data from social networks emphasize the efficacy of our proposed method. We make our code available at: https://github.com/SecureAIAutonomyLab/ConditionalVLM

Definition Search Book Streamline Icon: https://streamlinehq.com
References (54)
  1. 2021. Facebook moderator: ‘Every day was a nightmare’. https://www.bbc.com/news/technology-57088382.
  2. 2021. Judge OKs $85 mln settlement of Facebook moderators’ PTSD claims. https://www.reuters.com/legal/transactional/judge-oks-85-mln-settlement-facebook-moderators-ptsd-claims-2021-07-23/.
  3. 2023. Vicuna. https://github.com/lm-sys/FastChat.
  4. Slic superpixels. Technical report.
  5. “When a Tornado Hits Your Life:” Exploring Cyber Sexual Abuse Survivors’ Perspectives on Recovery. Journal of Counseling Sexology & Sexual Wellness: Research, Practice, and Education, 4(1): 1–8.
  6. Flamingo: a visual language model for few-shot learning. Advances in Neural Information Processing Systems, 35: 23716–23736.
  7. Are, C. 2020. How Instagram’s algorithm is censoring women and vulnerable users but helping online abusers. Feminist media studies, 20(5): 741–744.
  8. Young people, peer-to-peer grooming and sexual offending: Understanding and responding to harmful sexual behaviour within a social media society. Probation Journal, 62(4): 374–388.
  9. Towards Language Models That Can See: Computer Vision Through the LENS of Natural Language. arXiv preprint arXiv:2306.16410.
  10. Towards targeted obfuscation of adversarial unsafe images using reconstruction and counterfactual super region attribution explainability. In 32nd USENIX Security Symposium (USENIX Security 23), 643–660.
  11. Billy Perrigo. 2019. Facebook Says It’s Removing More Hate Speech Than Ever Before. But There’s a Catch.
  12. Binder, M. 2019. Facebook claims its new AI technology can automatically detect revenge porn. https://mashable.com/article/facebook-ai-tool-revenge-porn.
  13. Bronstein, C. 2021. Deplatforming sexual speech in the age of FOSTA/SESTA. Porn Studies, 8(4): 367–380.
  14. The EU digital markets act: a report from a panel of economic experts. Cabral, L., Haucap, J., Parker, G., Petropoulos, G., Valletti, T., and Van Alstyne, M., The EU Digital Markets Act, Publications Office of the European Union, Luxembourg.
  15. A combinatorial approach to explaining image classifiers. In 2021 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW), 35–43. IEEE.
  16. Grad-cam++: Generalized gradient-based visual explanations for deep convolutional networks. In 2018 IEEE winter conference on applications of computer vision (WACV), 839–847. IEEE.
  17. Minority report: Cyberbullying prediction on Instagram. In Proceedings of the 10th ACM conference on web science, 37–45.
  18. InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning. arXiv:2305.06500.
  19. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, 248–255. Ieee.
  20. Exon, J. 1996. The Communications Decency Act. Federal Communications Law Journal, 49(1): 4.
  21. Eva: Exploring the limits of masked visual representation learning at scale. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 19358–19369.
  22. Efficient graph-based image segmentation. International journal of computer vision, 59(2): 167–181.
  23. Axiom-based Grad-CAM: Towards Accurate Visualization and Explanation of CNNs. In BMVC.
  24. Gesley, J. 2021. Germany: Network Enforcement Act Amended to Better Fight Online Hate Speech. In Library of Congress, at: https://www. loc. gov/item/global-legal-monito r/2021-07-06/germany-network-enforcement-act-amended-to-better-fight-online-hat e-speech/#:~: text= Article% 20Germany% 3A% 20Network% 20Enforcement% 20Act, fake% 20news% 20in% 20social% 20networks.
  25. PyTorch library for CAM methods. https://github.com/jacobgil/pytorch-grad-cam.
  26. Harm and offence in media content: A review of the evidence.
  27. Hendricks, T. 2021. Cyberbullying increased 70% during the pandemic; Arizona schools are taking action. https://www.12news.com/article/news/crime/cyberbullying-increased-70-during-the-pandemic-arizona-schools-are-taking-action/75-fadf8d2c-cf11-43f0-b074-5de485a3247d.
  28. Self-harm, suicidal behaviours, and cyberbullying in children and young people: Systematic review. Journal of Medical Internet Research, 20(4).
  29. Kim, A. 2021. NSFW Data Scraper. https://github.com/alex000kim/nsfw˙data˙scraper.
  30. Segment anything. arXiv preprint arXiv:2304.02643.
  31. Krause, M. 2009. Identifying and managing stress in child pornography and child exploitation investigators. Journal of Police and Criminal Psychology, 24(1): 22–29.
  32. Nonconsensual image sharing: one in 25 Americans has been a victim of” revenge porn”.
  33. mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 7241–7259.
  34. Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. arXiv preprint arXiv:2301.12597.
  35. Effectiveness and users’ experience of obfuscation as a privacy-enhancing technology for sharing photos. Proceedings of the ACM on Human-Computer Interaction, 1(CSCW): 1–24.
  36. Meta. 2022. Appealed Content. https://transparency.fb.com/policies/improving/appealed-content-metric/.
  37. Compact watershed and preemptive slic: On improving trade-offs of superpixel segmentation algorithms. In 2014 22nd international conference on pattern recognition, 996–1001. IEEE.
  38. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Wallach, H.; Larochelle, H.; Beygelzimer, A.; d'Alché-Buc, F.; Fox, E.; and Garnett, R., eds., Advances in Neural Information Processing Systems 32, 8024–8035. Curran Associates, Inc.
  39. Learning transferable visual models from natural language supervision. In International conference on machine learning, 8748–8763. PMLR.
  40. Ramaswamy, H. G.; et al. 2020. Ablation-cam: Visual explanations for deep convolutional network via gradient-free localization. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 983–991.
  41. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125.
  42. A practitioner survey exploring the value of forensic tools, AI, filtering, & safer presentation for investigating child sexual abuse material (CSAM). Digital Investigation, 29.
  43. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, 618–626.
  44. Full-gradient representation for neural network visualization. Advances in neural information processing systems, 32.
  45. The psychological well-being of content moderators: the emotional labor of commercial moderation and avenues for improving support. In Proceedings of the 2021 CHI conference on human factors in computing systems, 1–14.
  46. Tenbarge, K. 2023. Instagram’s sex censorship sweeps up educators, adult stars and sex workers.
  47. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
  48. Bayesian adaptive superpixel segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 8470–8479.
  49. scikit-image: image processing in Python. PeerJ, 2: e453.
  50. Explainable image classification with evidence counterfactual. Pattern Analysis and Applications, 25(2): 315–335.
  51. Towards Understanding and Detecting Cyberbullying in Real-world Images. In 2020 19th IEEE International Conference on Machine Learning and Applications (ICMLA).
  52. Counterfactual explanations without opening the black box: Automated decisions and the GDPR. Harv. JL & Tech., 31: 841.
  53. Ofa: Unifying architectures, tasks, and modalities through a simple sequence-to-sequence learning framework. In International Conference on Machine Learning, 23318–23340. PMLR.
  54. Image as a Foreign Language: BEiT Pretraining for Vision and Vision-Language Tasks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 19175–19186.
Citations (3)

Summary

  • The paper proposes a dual-module framework that integrates ConditionalVLM for classification and counterfactual techniques for precise obfuscation of unsafe regions.
  • The method uses Bayesian Adaptive Superpixel Segmentation and FullGrad attribution to segment and target sensitive image areas while preserving safe content.
  • Experimental results show enhanced detection accuracy and comprehensive rationale generation compared to models like mPLUG and OFA-Large.

Image Safeguarding Using Conditional Vision LLMs

Introduction

The paper "Image Safeguarding: Reasoning with Conditional Vision LLM and Obfuscating Unsafe Content Counterfactually" addresses the critical issue of obfuscating unsafe images shared on social media platforms. This work introduces a dual-module approach to ensure both effective obfuscation of unsafe regions and the generation of accurate rationales for these actions. The proposed method leverages the Conditional Vision LLM (ConditionalVLM) for reasoning and uses counterfactual visual explanations to determine and obfuscate sensitive regions within images.

Conditional Vision-LLM

The ConditionalVLM forms the first module of the proposed architecture, designed to classify images as safe or unsafe by integrating visual reasoning with conditional inspection.

Architecture

The ConditionalVLM employs a pre-trained vision transformer (ViT) for extracting visual embeddings, which are further processed through a Conditional Image Instruction-guided Transformer (CIIT). This step involves conditioning the model outputs based on unsafe image classifiers, thereby enhancing the model's ability to accurately generate rationales grounded in specific unsafe image features. Figure 1

Figure 1: Overview of the proposed architecture. The initial module utilizes ConditionalVLM for classifying images as safe or unsafe, while the subsequent module proposes counterfactual visual explanations to identify and obfuscate the unsafe regions within the image.

Counterfactual Visual Explanations

The second module uses a counterfactual approach to identify and obfuscate unsafe regions in images. This technique minimizes obfuscation of safe areas, ensuring only sensitive portions are obscured, which is crucial for maintaining the utility of images during investigations.

Methodology

The process involves segmenting the image into subregions using Bayesian Adaptive Superpixel Segmentation (BASS), informed by the classifier's output. The segmentation process is guided by attribution maps from FullGrad to prioritize regions that most significantly contribute to the classification outcome. Subsequently, a discrete optimization approach using informed greedy search determines the minimal subregions that require obfuscation to change the classifier's unsafe verdict to safe. Figure 2

Figure 2: Examples of segmentation methods on a cyberbullying image. From top to bottom: (1) BASS, (2) SLIC, (3) SAM.

Experimental Evaluation

The efficacy of the ConditionalVLM and the obfuscation technique was validated on datasets encompassing various categories of unsafe content, such as nudity and cyberbullying. ConditionalBLIP, an adaptation of InstructBLIP integrating CIIT, outperformed existing models in accurately generating descriptions for unsafe images. It achieved notable improvements in detection accuracy and provided comprehensive explanations of unsafe features when compared to other models like mPLUG and OFA-Large.

Counterfactual Evaluation

Counterfactual subobject explanations demonstrated significant success rates in obfuscating unsafe image regions while preserving the integrity of safe areas. The method showcased a notable enhancement over previous approaches such as CSRA, with improved success rates under realistic constraints in testing scenarios.

Conclusion

This paper contributes to advancing image safeguarding techniques on social media by introducing a robust architecture that combines conditional reasoning with precise obfuscation strategies. ConditionalVLM and counterfactual analysis not only enhance the ability to obfuscate sensitive content efficiently but also ensure the rationale provided is grounded in domain-specific unsafe attributes. These innovations ultimately support safer content sharing and aid in exposure reduction for content moderators and investigative use cases. The proposed methodology has the potential to be integrated into broader AI-driven moderation systems across various platforms.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Github Logo Streamline Icon: https://streamlinehq.com
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 2 tweets and received 6 likes.

Upgrade to Pro to view all of the tweets about this paper:

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube