- The paper introduces a formal framework that categorizes prompt injection attacks and proposes systematic defenses.
- It demonstrates that combined attacks yield high efficacy across various tasks while proactive detection minimizes performance impact.
- Experimental evaluations across ten LLMs and seven tasks affirm the framework’s potential to enhance LLM-integrated application security.
The paper explores the vulnerabilities of LLM-integrated applications to prompt injection attacks. These attacks compromise inputs to induce LLMs to produce results desired by an attacker. The paper introduces a formal framework for both attacks and defenses and provides comprehensive evaluation results.
Introduction to Prompt Injection Attacks
Prompt injection attacks exploit LLM vulnerabilities by altering input prompts such that the output is controlled by the attacker.
Figure 1: Illustration of LLM-integrated Application under attack. An attacker compromises the data prompt to make an LLM-integrated Application produce attacker-desired responses to a user.
Attack and Defense Frameworks
The proposed frameworks formalize existing attack strategies as special cases, allowing systematic design and evaluation of both novel and combined attacks, as well as potential defenses.
Attack Framework
This framework categorizes prompt injection attack strategies, such as:
- Naive Attack: Straightforward concatenation
- Escape Characters: Misleading the LLM with special characters
- Context Ignoring: Using phrases to dismiss prior instructions
- Fake Completion: Confusing the LLM with false task completions
- Combined Attack: Integrating multiple strategies for enhanced efficacy


Figure 2: Comparing different attacks for different target tasks, demonstrating varied effectiveness.
Defense Framework
A comprehensive defense strategy combines both prevention (e.g., data isolation) and detection (e.g., perplexity-based methods), with proactive detection being particularly effective.
Figure 3: Examples of data prompt isolation, instructional prevention, and sandwich prevention.
Experimental Evaluation
Attack Effectiveness
The paper evaluates attacks using ten LLMs across seven tasks. The Combined Attack, merging various attack strategies, consistently shows high efficacy.


Figure 4: Impact of the number of in-context learning examples on Combined Attack across tasks.
Defense Effectiveness
Proactive detection consistently detects attacks with minimal impact on task utility. However, other defenses such as paraphrasing, though effective, could degrade the performance utility under non-attack scenarios.
Conclusions
This research advances the understanding of LLM vulnerabilities and defenses in real-world applications, highlighting significant progress in developing frameworks for prompt injection attacks. Future work will involve optimizing attack strategies further and improving defenses to recover from detected compromises efficiently. These insights are pivotal for enhancing the security and reliability of LLM-integrated applications in critical settings.