PRSA: Prompt Stealing Attacks against Real-World Prompt Services (2402.19200v3)
Abstract: Recently, LLMs have garnered widespread attention for their exceptional capabilities. Prompts are central to the functionality and performance of LLMs, making them highly valuable assets. The increasing reliance on high-quality prompts has driven significant growth in prompt services. However, this growth also expands the potential for prompt leakage, increasing the risk that attackers could replicate original functionalities, create competing products, and severely infringe on developers' intellectual property. Despite these risks, prompt leakage in real-world prompt services remains underexplored. In this paper, we present PRSA, a practical attack framework designed for prompt stealing. PRSA infers the detailed intent of prompts through very limited input-output analysis and can successfully generate stolen prompts that replicate the original functionality. Extensive evaluations demonstrate PRSA's effectiveness across two main types of real-world prompt services. Specifically, compared to previous works, it improves the attack success rate from 17.8% to 46.1% in prompt marketplaces and from 39% to 52% in LLM application stores, respectively. Notably, in the attack on "Math", one of the most popular educational applications in OpenAI's GPT Store with over 1 million conversations, PRSA uncovered a hidden Easter egg that had not been revealed previously. Besides, our analysis reveals that higher mutual information between a prompt and its output correlates with an increased risk of leakage. This insight guides the design and evaluation of two potential defenses against the security threats posed by PRSA. We have reported these findings to the prompt service vendors, including PromptBase and OpenAI, and actively collaborate with them to implement defensive measures.
- Prompti AI. Prompt Marketplace. https://prompti.ai/chatgpt-prompt/.
- Text analysis. Handbook of methods in cultural anthropology, 613, 1998.
- Conversation level syntax similarity metric. Behavior research methods, 50:1055–1073, 2018.
- Jailbreaking black box large language models in twenty queries. arXiv preprint arXiv:2310.08419, 2023.
- Equations for part-of-speech tagging. In AAAI, volume 11, pages 784–789. Citeseer, 1993.
- Instructzero: Efficient instruction optimization for black-box large language models. arXiv preprint arXiv:2306.03082, 2023.
- Fastkassim: A fast tree kernel-based syntactic similarity metric. arXiv preprint arXiv:2203.08299, 2022.
- Jailbreaker: Automated jailbreak across multiple large language model chatbots. arXiv preprint arXiv:2307.08715, 2023.
- Godofprompt.ai. AI Prompt Generator. https://www.godofprompt.ai/.
- Google. PaLM. https://ai.google/discover/palm2.
- GPTsdex. GPTStore. https://gptsdex.com/.
- Gretel. Measure the utility and quality of gpt-generated text using gretel’s new text report. https://gretel.ai/blog/synthetic-text-data-quality-report.
- Large language models for code: Security hardening and adversarial testing. In Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security, pages 1865–1879, 2023.
- High accuracy and high fidelity extraction of neural networks. In 29th USENIX security symposium (USENIX Security 20), pages 1345–1362, 2020.
- Challenges and applications of large language models. arXiv preprint arXiv:2307.10169, 2023.
- Lmcanvas: Object-oriented interaction to personalize large language model-powered writing environments, 2023.
- Neural architectures for named entity recognition. arXiv preprint arXiv:1603.01360, 2016.
- On extracting specialized code abilities from large language models: A feasibility study, 2023.
- Nypost. This hot new tech job pays $300K a year. https://nypost.com/2023/12/19/lifestyle/ai-prompt-engineers-in-demand-what-to-know-about-this-new-job/.
- Towards reverse-engineering black-box neural networks. Explainable AI: Interpreting, Explaining and Visualizing Deep Learning, pages 121–144, 2019.
- OpenAI. GPTs. https://openai.com/blog/introducing-gpts.
- OpenAI. OpenAI Product. https://openai.com/product.
- OpenGPT. LLM Service Platform. https://open-gpt.app/.
- Knockoff nets: Stealing functionality of black-box models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4954–4963, 2019.
- Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pages 311–318, 2002.
- Ignore previous prompt: Attack techniques for language models. arXiv preprint arXiv:2211.09527, 2022.
- PromptBase. Prompt Marketplace. https://promptbase.com.
- Automatic prompt optimization with" gradient descent" and beam search. arXiv preprint arXiv:2305.03495, 2023.
- Can ai-generated text be reliably detected? arXiv preprint arXiv:2303.11156, 2023.
- " do anything now": Characterizing and evaluating in-the-wild jailbreak prompts on large language models. arXiv preprint arXiv:2308.03825, 2023.
- Model stealing attacks against inductive graph neural networks. In 2022 IEEE Symposium on Security and Privacy (SP), pages 1175–1192. IEEE, 2022.
- Prompt-and-rerank: A method for zero-shot and few-shot arbitrary textual style transfer with small language models. arXiv preprint arXiv:2205.11503, 2022.
- Performance evaluation of similarity measures on similar and dissimilar text retrieval. In 2015 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K), volume 1, pages 577–584. IEEE, 2015.
- Llama: Open and efficient foundation language models, 2023.
- Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of machine learning research, 9(11), 2008.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- Stealing hyperparameters in machine learning. In 2018 IEEE symposium on security and privacy (SP), pages 36–52. IEEE, 2018.
- Self-instruct: Aligning language model with self generated instructions. arXiv preprint arXiv:2212.10560, 2022.
- Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483, 2023.
- Hard non-monotonic attention for character-level transduction. arXiv preprint arXiv:1808.10024, 2018.
- Large language models as optimizers. arXiv preprint arXiv:2309.03409, 2023.
- Promptcare: Prompt copyright protection by watermark injection and verification. arXiv preprint arXiv:2308.02816, 2023.
- Prompts should not be seen as secrets: Systematically measuring prompt extraction attack success. arXiv preprint arXiv:2307.06865, 2023.
- Large language models are human-level prompt engineers. arXiv preprint arXiv:2211.01910, 2022.
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.