Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 134 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 33 tok/s Pro
GPT-5 High 30 tok/s Pro
GPT-4o 86 tok/s Pro
Kimi K2 173 tok/s Pro
GPT OSS 120B 438 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

Learning to Compress Prompt in Natural Language Formats (2402.18700v2)

Published 28 Feb 2024 in cs.CL, cs.AI, and cs.LG

Abstract: LLMs are great at processing multiple natural language processing tasks, but their abilities are constrained by inferior performance with long context, slow inference speed, and the high cost of computing the results. Deploying LLMs with precise and informative context helps users process large-scale datasets more effectively and cost-efficiently. Existing works rely on compressing long prompt contexts into soft prompts. However, soft prompt compression encounters limitations in transferability across different LLMs, especially API-based LLMs. To this end, this work aims to compress lengthy prompts in the form of natural language with LLM transferability. This poses two challenges: (i) Natural Language (NL) prompts are incompatible with back-propagation, and (ii) NL prompts lack flexibility in imposing length constraints. In this work, we propose a Natural Language Prompt Encapsulation (Nano-Capsulator) framework compressing original prompts into NL formatted Capsule Prompt while maintaining the prompt utility and transferability. Specifically, to tackle the first challenge, the Nano-Capsulator is optimized by a reward function that interacts with the proposed semantics preserving loss. To address the second question, the Nano-Capsulator is optimized by a reward function featuring length constraints. Experimental results demonstrate that the Capsule Prompt can reduce 81.4% of the original length, decrease inference latency up to 4.5x, and save 80.1% of budget overheads while providing transferability across diverse LLMs and different datasets.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (25)
  1. Anthropic. 2023. Claude.
  2. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
  3. Adapting language models to compress contexts. arXiv preprint arXiv:2305.14788.
  4. Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality.
  5. Palm: Scaling language modeling with pathways. Journal of Machine Learning Research, 24(240):1–113.
  6. Boolq: Exploring the surprising difficulty of natural yes/no questions. arXiv preprint arXiv:1905.10044.
  7. Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168.
  8. Eraser: A benchmark to evaluate rationalized nlp models.
  9. In-context autoencoder for context compression in a large language model. arXiv preprint arXiv:2307.06945.
  10. Llmlingua: Compressing prompts for accelerated inference of large language models. arXiv preprint arXiv:2310.05736.
  11. Llm maybe longlm: Self-extend llm context window without tuning. arXiv preprint arXiv:2401.01325.
  12. triviaqa: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension. arXiv e-prints, page arXiv:1705.03551.
  13. Looking beyond the surface:a challenge set for reading comprehension over multiple sentences. In Proceedings of North American Chapter of the Association for Computational Linguistics (NAACL).
  14. Large language models are zero-shot reasoners. Advances in neural information processing systems, 35:22199–22213.
  15. Compressing context to enhance inference efficiency of large language models. arXiv preprint arXiv:2310.06201.
  16. G-eval: Nlg evaluation using gpt-4 with better human alignment, may 2023. arXiv preprint arXiv:2303.16634, 6.
  17. Learning to compress prompts with gist tokens. arXiv preprint arXiv:2304.08467.
  18. Context compression for auto-regressive transformers with sentinel tokens. arXiv preprint arXiv:2310.08152.
  19. Subhro Roy and Dan Roth. 2015. Solving general arithmetic word problems. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics.
  20. CommonsenseQA: A question answering challenge targeting commonsense knowledge. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4149–4158, Minneapolis, Minnesota. Association for Computational Linguistics.
  21. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
  22. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837.
  23. Prompt compression and contrastive conditioning for controllability and toxicity reduction in language models. arXiv preprint arXiv:2210.03162.
  24. Harnessing the power of llms in practice: A survey on chatgpt and beyond. arXiv preprint arXiv:2304.13712.
  25. Opt: Open pre-trained transformer language models.
Citations (12)

Summary

  • The paper proposes Nano-Capsulator, which compresses lengthy prompts while preserving semantic integrity and task utility.
  • It employs semantics-preserving loss and utility-based reward functions to balance brevity and performance.
  • The approach achieves up to 81.4% prompt length reduction, 4.5x latency improvement, and 80.1% API cost savings.

Learning to Compress Prompt in Natural Language Formats

Introduction

The paper "Learning to Compress Prompt in Natural Language Formats" focuses on addressing limitations of LLMs regarding prompt length constraints and transferability. The authors propose the Nano-Capsulator framework to compress long prompts into Natural Language (NL) formats while preserving their utility and ensuring transferability across different LLMs.

Nano-Capsulator Framework

The Nano-Capsulator framework introduces a method for compressing lengthy LLM prompts into concise NL formats termed as Capsules. This approach tackles issues inherent in previous soft prompt-based methods, which struggle with transferability across various LLMs, especially API-based ones. Nano-Capsulator employs a unique optimization strategy involving reward functions for prompt utility preservation and semantic preservation through dedicated loss functions. Figure 1

Figure 1: The illustration of Nano-Capsulator training framework. Nano-Capsulator compress the long prompt with the action of semantic and utility preservation.

Prompt Compression Mechanics

To compress prompts effectively, Nano-Capsulator utilizes two main strategies:

  1. Semantics-Preserving Loss Function: This ensures that the compressed Capsule retains the essential semantic information from the original prompt. The semantics-preserving component uses embeddings to maintain high similarity between the original and compressed prompts.
  2. Utility-Preserving Reward Function: This function incorporates length constraint penalties and evaluates the utility retention by measuring downstream task performance compared to original prompts. The reward function dynamically adjusts to provide better learning feedback. Figure 2

    Figure 2: An example of successful prompt compression with NL formats. The compressed NL-formatted prompt (green) aims to obtain a shorter length and maintain transferability and utility of the long prompt (red).

Experimental Evaluation

The evaluation involved diverse datasets and LLMs, illustrating the effectiveness and transferability of the Nano-Capsulator framework across varying scenarios. The experiments highlight key performance improvements:

  • Compression Efficiency: Nano-Capsulator can reduce prompt lengths by up to 81.4% while retaining semantic utility and applicability across multiple LLMs.
  • Cost and Latency Reduction: The framework significantly decreases inference latency by up to 4.5 times and reduces API cost overhead by up to 80.1%. Figure 3

    Figure 3: Evaluation of transferability on Nano-Capsulator across unseen datasets.

Comparison with Existing Methods

In comparisons with zero-shot summarization and soft prompt compression methods, Nano-Capsulator demonstrates superior performance in maintaining prompt quality and utility. This is attributed to the simultaneous optimization of length and semantic preservation, which enhances adaptability and generalization across different tasks and models. Figure 4

Figure 4

Figure 4: Comparison results of Capsule and Zero-shot Summarization on GSM8K dataset (left) and MultiRC dataset (right).

Ablation Studies and Impact Analysis

To understand the contribution of each component, comprehensive ablation studies were conducted. Key findings include:

  • Reward Function Impact: The reward function substantially contributes to utility retention by penalizing loss of performance in compressed prompts.
  • Effect of Length Constraints: Varying length constraints provided insights into optimal compression rates for different models, illustrating trade-offs between brevity and information retention. Figure 5

Figure 5

Figure 5: Ablation studies of comparison with Capsule and GPT-35-Turbo Summarization on CSQA dataset and GSM8K dataset (left); and of the contribution of Reward Function.

Figure 6

Figure 6

Figure 6: Impact of prompt length on Vicuna-13B (left) and Claude2 (right) on TriviaQA dataset.

Conclusion

The Nano-Capsulator framework effectively balances prompt brevity, utility preservation, and model transferability, addressing critical challenges associated with LLMs. Its application spans various domains and datasets without necessitating retraining, making it a versatile tool for optimizing LLM efficiency. Future research might focus on extending this method to support more complex interactions and integrating more sophisticated reward mechanisms to further enhance adaptability and robustness.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube