How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs (2311.16101v1)
Abstract: This work focuses on the potential of Vision LLMs (VLLMs) in visual reasoning. Different from prior studies, we shift our focus from evaluating standard performance to introducing a comprehensive safety evaluation suite, covering both out-of-distribution (OOD) generalization and adversarial robustness. For the OOD evaluation, we present two novel VQA datasets, each with one variant, designed to test model performance under challenging conditions. In exploring adversarial robustness, we propose a straightforward attack strategy for misleading VLLMs to produce visual-unrelated responses. Moreover, we assess the efficacy of two jailbreaking strategies, targeting either the vision or language component of VLLMs. Our evaluation of 21 diverse models, ranging from open-source VLLMs to GPT-4V, yields interesting observations: 1) Current VLLMs struggle with OOD texts but not images, unless the visual information is limited; and 2) These VLLMs can be easily misled by deceiving vision encoders only, and their vision-language training often compromise safety protocols. We release this safety evaluation suite at https://github.com/UCSC-VLAA/vLLM-safety-benchmark.
- Reveal of vision transformers robustness against adversarial attacks. arXiv preprint arXiv:2106.03734, 2021.
- Palm 2 technical report. arXiv preprint arXiv:2305.10403, 2023.
- Vqa: Visual question answering. In ICCV, pages 2425–2433, 2015.
- Qwen technical report. arXiv preprint arXiv:2309.16609, 2023a.
- Qwen-vl: A frontier large vision-language model with versatile abilities. arXiv preprint arXiv:2308.12966, 2023b.
- Introducing our multimodal models, 2023.
- Adversarial robustness comparison of vision transformer and mlp-mixer to cnns. arXiv preprint arXiv:2110.02797, 2021.
- Visit-bench: A benchmark for vision-language instruction following inspired by real-world use. arXiv preprint arXiv:2308.06595, 2023.
- Google Brain. https://www.kaggle.com/competitions/nips-2017-non-targeted-adversarial-attack, 2017.
- Artificial intelligence, bias and clinical safety. BMJ Quality & Safety, 2019.
- Sketchygan: Towards diverse and realistic sketch to image synthesis. In CVPR, 2018.
- Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality. See https://vicuna. lmsys. org (accessed 14 April 2023), 2023.
- Holistic analysis of hallucination in gpt-4v (ision): Bias and interference challenges. arXiv preprint arXiv:2311.03287, 2023.
- Instructblip: Towards general-purpose vision-language models with instruction tuning. ArXiv, abs/2305.06500, 2023.
- Write and paint: Generative vision-language models are unified modal learners. In ICLR, 2023.
- How robust is google’s bard to adversarial image attacks? arXiv preprint arXiv:2309.11751, 2023.
- A survey of vision-language pre-trained models. arXiv preprint arXiv:2202.10936, 2022.
- How do humans sketch objects? SIGGRAPH, 2012.
- Mme: A comprehensive evaluation benchmark for multimodal large language models. arXiv preprint arXiv:2306.13394, 2023.
- Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:2304.15010, 2023.
- Figstep: Jailbreaking large vision-language models via typographic visual prompts. arXiv preprint arXiv:2311.05608, 2023.
- Google Jigsaw. https://perspectiveapi.com/, 2023.
- Picture that sketch: Photorealistic image generation from abstract sketches. In CVPR, 2023.
- Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. arXiv preprint arXiv:2301.12597, 2023a.
- Distilling large vision-language model with out-of-distribution generalizability. In ICCV, 2023b.
- Evaluating object hallucination in large vision-language models. arXiv preprint arXiv:2305.10355, 2023c.
- Visual instruction tuning. arXiv preprint arXiv:2304.08485, 2023a.
- Mmbench: Is your multi-modal model an all-around player? arXiv preprint arXiv:2307.06281, 2023b.
- OpenAI. Chatgpt can now see, hear, and speak, 2023a.
- OpenAI. Gpt-4 technical report. Technical report, OpenAI, 2023b.
- OpenAI. Gpt-4v(ision) technical work and authors. Technical report, OpenAI, 2023c.
- Visual adversarial examples jailbreak aligned large language models. In The Second Workshop on New Frontiers in Adversarial Machine Learning, 2023.
- Learning transferable visual models from natural language supervision. In ICML, pages 8748–8763, 2021.
- Pandagpt: One model to instruction-follow them all. arXiv preprint arXiv:2305.16355, 2023.
- Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013.
- Understanding the capabilities, limitations, and societal impact of large language models. arXiv preprint arXiv:2102.02503, 2021.
- Agibench: A multi-granularity, multimodal, human-referenced, auto-scoring benchmark for large language models. arXiv preprint arXiv:2309.06495, 2023.
- Rachal Tatman. https://www.kaggle.com/datasets/rtatman/english-word-frequency, 2017.
- Creating multimodal interactive agents with imitation and self-supervised learning. arXiv preprint arXiv:2112.03763, 2021.
- Mass-producing failures of multimodal systems with language models. arXiv preprint arXiv:2306.12105, 2023.
- Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023a.
- Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023b.
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023c.
- Resee: Responding through seeing fine-grained visual knowledge in open-domain dialogue. arXiv preprint arXiv:2305.13602, 2023a.
- Zerogen: Zero-shot multimodal controllable text generation with multiple oracles. arXiv preprint arXiv:2306.16649, 2023b.
- Sight beyond text: Multi-modal training enhances llms in truthfulness and ethics. arXiv preprint arXiv:2309.07120, 2023c.
- Simplesafetytests: a test suite for identifying critical safety risks in large language models, 2023.
- An llm-free multi-dimensional benchmark for mllms hallucination evaluation. arXiv preprint arXiv:2311.07397, 2023a.
- Cogvlm: Visual expert for pretrained language models. arXiv preprint arXiv:2311.03079, 2023b.
- Jarvis-1: Open-world multi-task agents with memory-augmented multimodal language models. arXiv preprint arXiv:2311.05997, 2023c.
- Jailbreak and guard aligned language models with only few in-context demonstrations. arXiv preprint arXiv:2310.06387, 2023.
- mplug-owl: Modularization empowers large language models with multimodality. arXiv preprint arXiv:2304.14178, 2023.
- Gptfuzzer: Red teaming large language models with auto-generated jailbreak prompts. arXiv preprint arXiv:2309.10253, 2023.
- Glm-130b: An open bilingual pre-trained model. arXiv preprint arXiv:2210.02414, 2022.
- What if the tv was off? examining counterfactual reasoning abilities of multi-modal language models. In ICCVW, 2023a.
- Internlm-xcomposer: A vision-language large model for advanced text-image comprehension and composition. arXiv preprint arXiv:2309.15112, 2023b.
- Ood-cv: a benchmark for robustness to out-of-distribution shifts of individual nuisances in natural images. In European Conference on Computer Vision, pages 163–180. Springer, 2022.
- On evaluating adversarial robustness of large vision-language models. arXiv preprint arXiv:2305.16934, 2023.
- VLUE: A multi-task multi-dimension benchmark for evaluating vision-language pre-training. In ICML, 2022.
- Agents: An open-source framework for autonomous language agents. arXiv preprint arXiv:2309.07870, 2023a.
- Analyzing and mitigating object hallucination in large vision-language models. arXiv preprint arXiv:2310.00754, 2023b.
- Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:2304.10592, 2023.
- Universal and transferable adversarial attacks on aligned language models. arXiv preprint arXiv:2307.15043, 2023.