Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 134 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 35 tok/s Pro
GPT-5 High 22 tok/s Pro
GPT-4o 97 tok/s Pro
Kimi K2 176 tok/s Pro
GPT OSS 120B 432 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

Formalizing and Benchmarking Prompt Injection Attacks and Defenses (2310.12815v4)

Published 19 Oct 2023 in cs.CR, cs.AI, cs.CL, and cs.LG

Abstract: A prompt injection attack aims to inject malicious instruction/data into the input of an LLM-Integrated Application such that it produces results as an attacker desires. Existing works are limited to case studies. As a result, the literature lacks a systematic understanding of prompt injection attacks and their defenses. We aim to bridge the gap in this work. In particular, we propose a framework to formalize prompt injection attacks. Existing attacks are special cases in our framework. Moreover, based on our framework, we design a new attack by combining existing ones. Using our framework, we conduct a systematic evaluation on 5 prompt injection attacks and 10 defenses with 10 LLMs and 7 tasks. Our work provides a common benchmark for quantitatively evaluating future prompt injection attacks and defenses. To facilitate research on this topic, we make our platform public at https://github.com/liu00222/Open-Prompt-Injection.

Citations (35)

Summary

  • The paper introduces a formal framework that categorizes prompt injection attacks and proposes systematic defenses.
  • It demonstrates that combined attacks yield high efficacy across various tasks while proactive detection minimizes performance impact.
  • Experimental evaluations across ten LLMs and seven tasks affirm the framework’s potential to enhance LLM-integrated application security.

Formalizing and Benchmarking Prompt Injection Attacks and Defenses

The paper explores the vulnerabilities of LLM-integrated applications to prompt injection attacks. These attacks compromise inputs to induce LLMs to produce results desired by an attacker. The paper introduces a formal framework for both attacks and defenses and provides comprehensive evaluation results.

Introduction to Prompt Injection Attacks

Prompt injection attacks exploit LLM vulnerabilities by altering input prompts such that the output is controlled by the attacker. Figure 1

Figure 1: Illustration of LLM-integrated Application under attack. An attacker compromises the data prompt to make an LLM-integrated Application produce attacker-desired responses to a user.

Attack and Defense Frameworks

The proposed frameworks formalize existing attack strategies as special cases, allowing systematic design and evaluation of both novel and combined attacks, as well as potential defenses.

Attack Framework

This framework categorizes prompt injection attack strategies, such as:

  • Naive Attack: Straightforward concatenation
  • Escape Characters: Misleading the LLM with special characters
  • Context Ignoring: Using phrases to dismiss prior instructions
  • Fake Completion: Confusing the LLM with false task completions
  • Combined Attack: Integrating multiple strategies for enhanced efficacy Figure 2

Figure 2

Figure 2

Figure 2: Comparing different attacks for different target tasks, demonstrating varied effectiveness.

Defense Framework

A comprehensive defense strategy combines both prevention (e.g., data isolation) and detection (e.g., perplexity-based methods), with proactive detection being particularly effective. Figure 3

Figure 3: Examples of data prompt isolation, instructional prevention, and sandwich prevention.

Experimental Evaluation

Attack Effectiveness

The paper evaluates attacks using ten LLMs across seven tasks. The Combined Attack, merging various attack strategies, consistently shows high efficacy. Figure 4

Figure 4

Figure 4

Figure 4: Impact of the number of in-context learning examples on Combined Attack across tasks.

Defense Effectiveness

Proactive detection consistently detects attacks with minimal impact on task utility. However, other defenses such as paraphrasing, though effective, could degrade the performance utility under non-attack scenarios.

Conclusions

This research advances the understanding of LLM vulnerabilities and defenses in real-world applications, highlighting significant progress in developing frameworks for prompt injection attacks. Future work will involve optimizing attack strategies further and improving defenses to recover from detected compromises efficiently. These insights are pivotal for enhancing the security and reliability of LLM-integrated applications in critical settings.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Github Logo Streamline Icon: https://streamlinehq.com
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 3 tweets and received 42 likes.

Upgrade to Pro to view all of the tweets about this paper:

Reddit Logo Streamline Icon: https://streamlinehq.com