Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
104 tokens/sec
GPT-4o
12 tokens/sec
Gemini 2.5 Pro Pro
40 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Prompt-SAW: Leveraging Relation-Aware Graphs for Textual Prompt Compression (2404.00489v2)

Published 30 Mar 2024 in cs.CL, cs.AI, and cs.LG

Abstract: LLMs have shown exceptional abilities for multiple different natural language processing tasks. While prompting is a crucial tool for LLM inference, we observe that there is a significant cost associated with exceedingly lengthy prompts. Existing attempts to compress lengthy prompts lead to substandard results in terms of readability/interpretability of the compressed prompt, with a detrimental impact on prompt utility. To address this, we propose PromptSAW: Prompt compresSion via Relation AWare graphs, an effective strategy for prompt compression over task-agnostic and task-aware prompts. Prompt-SAW uses the prompt's textual information to build a graph and later extracts key information elements in the graph to come up with the compressed prompt. We also propose GSM8K-aug, i.e., an extended version of the existing GSM8K benchmark for task-agnostic prompts in order to provide a comprehensive evaluation platform. Experimental evaluation using benchmark datasets shows that prompts compressed by Prompt-SAW are not only better in terms of readability, but they also outperform the best-performing baseline models by up to 10.1 and 77.1, respectively, for task-agnostic and task-aware settings while compressing the original prompt text by 34.9 and 56.7.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (46)
  1. Gpt-4 technical report. arXiv preprint arXiv:2303.08774, 2023.
  2. Antonym-synonym classification based on new sub-space embeddings. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33(01), pp.  6204–6211, 2019.
  3. Fine-grained named entity typing over distantly supervised data based on refined representations. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34(05), pp.  7391–7398, 2020.
  4. Fine-grained named entity typing over distantly supervised data via refinement in hyperbolic space. arXiv preprint arXiv:2101.11212, 2021.
  5. Gari: Graph attention for relative isomorphism of arabic word embeddings. arXiv preprint arXiv:2310.13068, 2023a.
  6. Gri: Graph-based relative isomorphism of word embedding spaces. arXiv preprint arXiv:2310.12360, 2023b.
  7. Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality, March 2023. URL https://lmsys.org/blog/2023-03-30-vicuna/.
  8. Training verifiers to solve math word problems. ArXiv, abs/2110.14168, 2021. URL https://api.semanticscholar.org/CorpusID:239998651.
  9. A survey on in-context learning. arXiv preprint arXiv:2301.00234, 2022.
  10. Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography: Third Theory of Cryptography Conference, TCC 2006, New York, NY, USA, March 4-7, 2006. Proceedings 3, pp.  265–284. Springer, 2006.
  11. High dimensional differentially private stochastic optimization with heavy-tailed data. In Proceedings of the 41st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, pp.  227–236, 2022.
  12. Differentially private natural language models: Recent advances and future directions. arXiv preprint arXiv:2301.09112, 2023a.
  13. Seat: stable and explainable attention. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37(11), pp.  12907–12915, 2023b.
  14. Privacy-preserving sparse generalized eigenvalue problem. In International Conference on Artificial Intelligence and Statistics, pp.  5052–5062. PMLR, 2023c.
  15. A survey on knowledge graphs: Representation, acquisition, and applications. IEEE Transactions on Neural Networks and Learning Systems, 33:494–514, 2020. URL https://api.semanticscholar.org/CorpusID:211010433.
  16. Llmlingua: Compressing prompts for accelerated inference of large language models. In Conference on Empirical Methods in Natural Language Processing, 2023a. URL https://api.semanticscholar.org/CorpusID:263830701.
  17. Longllmlingua: Accelerating and enhancing llms in long context scenarios via prompt compression. ArXiv, abs/2310.06839, 2023b. URL https://api.semanticscholar.org/CorpusID:263830692.
  18. Discrete prompt compression with reinforcement learning. ArXiv, abs/2308.08758, 2023. URL https://api.semanticscholar.org/CorpusID:261030884.
  19. Kg-gpt: A general framework for reasoning on knowledge graphs using large language models. ArXiv, abs/2310.11220, 2023. URL https://api.semanticscholar.org/CorpusID:264172465.
  20. Faithful vision-language interpretation via concept bottleneck models. In The Twelfth International Conference on Learning Representations, 2023.
  21. The power of scale for parameter-efficient prompt tuning. In Conference on Empirical Methods in Natural Language Processing, 2021. URL https://api.semanticscholar.org/CorpusID:233296808.
  22. Retrieval-augmented generation for knowledge-intensive nlp tasks. ArXiv, abs/2005.11401, 2020. URL https://api.semanticscholar.org/CorpusID:218869575.
  23. Yucheng Li. Unlocking context constraints of llms: Enhancing context efficiency of llms with self-information-based content filtering. ArXiv, abs/2304.12102, 2023. URL https://api.semanticscholar.org/CorpusID:258298489.
  24. Lost in the middle: How language models use long contexts. Transactions of the Association for Computational Linguistics, 12:157–173, 2023. URL https://api.semanticscholar.org/CorpusID:259360665.
  25. Reasoning on graphs: Faithful and interpretable large language model reasoning. ArXiv, abs/2310.01061, 2023. URL https://api.semanticscholar.org/CorpusID:263605944.
  26. Locating and editing factual associations in gpt. In Neural Information Processing Systems, 2022. URL https://api.semanticscholar.org/CorpusID:255825985.
  27. Harinder Pal and Mausam. Demonyms and compound relational nouns in nominal open ie. In AKBC@NAACL-HLT, 2016. URL https://api.semanticscholar.org/CorpusID:12531907.
  28. Unifying large language models and knowledge graphs: A roadmap. ArXiv, abs/2306.08302, 2023. URL https://api.semanticscholar.org/CorpusID:259165563.
  29. Generative agents: Interactive simulacra of human behavior. Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology, 2023. URL https://api.semanticscholar.org/CorpusID:258040990.
  30. A systematic survey of prompt engineering in large language models: Techniques and applications. ArXiv, abs/2402.07927, 2024. URL https://api.semanticscholar.org/CorpusID:267636769.
  31. Claude E. Shannon. Prediction and entropy of printed english. Bell System Technical Journal, 30:50–64, 1951. URL https://api.semanticscholar.org/CorpusID:9101213.
  32. Differentially private non-convex learning for multi-layer neural networks. arXiv preprint arXiv:2310.08425, 2023.
  33. Faster rates of private stochastic convex optimization. In International Conference on Algorithmic Learning Theory, pp.  995–1002. PMLR, 2022.
  34. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
  35. Differentially private (gradient) expectation maximization algorithm with statistical guarantees. arXiv preprint arXiv:2010.13520, 2020.
  36. Estimating smooth glm in non-interactive local differential privacy model with public unlabeled data. In Algorithmic Learning Theory, pp.  1207–1213. PMLR, 2021.
  37. Generalized linear models in non-interactive local differential privacy with public data. Journal of Machine Learning Research, 24(132):1–57, 2023.
  38. Chain of thought prompting elicits reasoning in large language models. ArXiv, abs/2201.11903, 2022. URL https://api.semanticscholar.org/CorpusID:246411621.
  39. Prompt compression and contrastive conditioning for controllability and toxicity reduction in language models. In Conference on Empirical Methods in Natural Language Processing, 2022. URL https://api.semanticscholar.org/CorpusID:252762169.
  40. Practical differentially private and byzantine-resilient federated learning. Proceedings of the ACM on Management of Data, 1(2):1–26, 2023.
  41. How does selection leak privacy: Revisiting private selection and improved results for hyper-parameter tuning. arXiv preprint arXiv:2402.13087, 2024.
  42. An llm can fool itself: A prompt-based adversarial attack. arXiv preprint arXiv:2310.13345, 2023a.
  43. Compress, then prompt: Improving accuracy-efficiency trade-off of llm inference with transferable prompt. ArXiv, abs/2305.11186, 2023b. URL https://api.semanticscholar.org/CorpusID:258823240.
  44. Understanding in-context learning from repetitions. ArXiv, abs/2310.00297, 2023. URL https://api.semanticscholar.org/CorpusID:263334398.
  45. Moral: Moe augmented lora for llms’ lifelong learning. arXiv preprint arXiv:2402.11260, 2024a.
  46. Human-ai interactions in the communication era: Autophagy makes large models achieving local optima. arXiv preprint arXiv:2402.11271, 2024b.
Citations (9)

Summary

We haven't generated a summary for this paper yet.