Emergent Mind

Red Teaming Visual Language Models

(2401.12915)
Published Jan 23, 2024 in cs.AI , cs.CL , and cs.CV

Abstract

VLMs (Vision-Language Models) extend the capabilities of LLMs (LLMs) to accept multimodal inputs. Since it has been verified that LLMs can be induced to generate harmful or inaccurate content through specific test cases (termed as Red Teaming), how VLMs perform in similar scenarios, especially with their combination of textual and visual inputs, remains a question. To explore this problem, we present a novel red teaming dataset RTVLM, which encompasses 10 subtasks (e.g., image misleading, multi-modal jail-breaking, face fairness, etc) under 4 primary aspects (faithfulness, privacy, safety, fairness). Our RTVLM is the first red-teaming dataset to benchmark current VLMs in terms of these 4 different aspects. Detailed analysis shows that 10 prominent open-sourced VLMs struggle with the red teaming in different degrees and have up to 31% performance gap with GPT-4V. Additionally, we simply apply red teaming alignment to LLaVA-v1.5 with Supervised Fine-tuning (SFT) using RTVLM, and this bolsters the models' performance with 10% in RTVLM test set, 13% in MM-Hal, and without noticeable decline in MM-Bench, overpassing other LLaVA-based models with regular alignment data. This reveals that current open-sourced VLMs still lack red teaming alignment. Our code and datasets will be open-source.

Overview of the RTVLM pipeline, highlighting data collection, evaluation, and alignment processes.

Overview

  • The study introduces the RTVLM dataset aimed to assess VLMs on Faithfulness, Safety, Privacy, and Fairness for secure deployment.

  • RTVLM encompasses ten subtasks with images from diffusion techniques and GPT-4 based questions to pinpoint VLM vulnerabilities.

  • Experimentation revealed VLM performance inconsistencies and deficiencies in red team alignment, with notable gaps discovered against GPT-4V.

  • Inclusion of RTVLM in Supervised Fine-tuning improves VLM performance on security-critical tasks, maintaining overall effectiveness.

  • The research advocates for VLM security and promotes RTVLM as an essential tool for enhancing safety and robustness in model outputs.

Summary of the Red Teaming Visual Language Models Study

Introduction

The emergence of Vision-Language Models (VLMs), which combine the textual and visual processing capabilities of LLMs, has broadened the spectrum of AI applications. Despite the evident progress in VLMs, the lack of systematic red teaming benchmarks prompted the introduction of the Red Teaming Visual Language Model (RTVLM) dataset. This newly constructed dataset assesses VLMs in areas crucial for secure deployment: Faithfulness, Safety, Privacy, and Fairness.

RTVLM Dataset Construction

RTVLM includes ten subtasks, each designed to target specific vulnerabilities within VLMs. The dataset ensures novelty by using images generated via diffusion techniques and human-annotated or GPT-4 generated questions. In evaluating Faithfulness, the dataset includes text and visual misleading tasks, and image order processing; Privacy is assessed through the distinction between public figures and private individuals, while Safety tests model responses to ethically risky inputs. For Fairness, VLMs are evaluated for bias towards individuals of varying races and genders.

Experimental Results

Upon evaluation, it was found that VLMs exhibit performance gaps in red teaming tasks and often lack red teaming alignment. The dataset served to benchmark and perform detailed analysis on 10 prominent VLMs, highlighting up to a 31% performance gap with GPT-4V. Incorporating RTVLM for Supervised Fine-tuning (SFT) into models like LLaVA-v1.5 improved performance significantly on the RTVLM test set and related benchmarks without degrading general performance, suggesting the necessity of incorporating red teaming alignment in the training process.

Red Teaming Alignment and Conclusions

The paper elucidates that current alignment practices in VLMs are insufficient when encountering red teaming scenarios. It also empirically demonstrates that directly aligning models with RTVLM improves both the safety and robustness of model outputs. The study concludes by underscoring the importance of VLM security, and the RTVLM dataset is posited as a valuable asset for advancing model security measures.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.