Efficient Prompting Methods for Large Language Models: A Survey (2404.01077v2)
Abstract: Prompting is a mainstream paradigm for adapting LLMs to specific natural language processing tasks without modifying internal parameters. Therefore, detailed supplementary knowledge needs to be integrated into external prompts, which inevitably brings extra human efforts and computational burdens for practical applications. As an effective solution to mitigate resource consumption, Efficient Prompting Methods have attracted a wide range of attention. We provide mathematical expressions at a high level to deeply discuss Automatic Prompt Engineering for different prompt components and Prompt Compression in continuous and discrete spaces. Finally, we highlight promising future directions to inspire researchers interested in this field.
- A general language assistant as a laboratory for alignment. ArXiv preprint, abs/2112.00861, 2021. URL https://arxiv.org/abs/2112.00861.
- Language models are few-shot learners. In Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin (eds.), Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020. URL https://proceedings.neurips.cc/paper/2020/hash/1457c0d6bfcb4967418bfb8ac142f64a-Abstract.html.
- Sparks of artificial general intelligence: Early experiments with gpt-4. ArXiv preprint, abs/2303.12712, 2023. URL https://arxiv.org/abs/2303.12712.
- Recurrent memory transformer. ArXiv preprint, abs/2207.06881, 2022. URL https://arxiv.org/abs/2207.06881.
- Evoprompting: Language models for code-level neural architecture search. ArXiv preprint, abs/2302.14838, 2023a. URL https://arxiv.org/abs/2302.14838.
- Instructzero: Efficient instruction optimization for black-box large language models. ArXiv preprint, abs/2306.03082, 2023b. URL https://arxiv.org/abs/2306.03082.
- Adapting language models to compress contexts. ArXiv preprint, abs/2305.14788, 2023. URL https://arxiv.org/abs/2305.14788.
- Prompt injection: Parameterization of fixed inputs. ArXiv preprint, abs/2206.11349, 2022. URL https://arxiv.org/abs/2206.11349.
- RLPrompt: Optimizing discrete text prompts with reinforcement learning. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pp. 3369–3391, Abu Dhabi, United Arab Emirates, 2022. Association for Computational Linguistics. URL https://aclanthology.org/2022.emnlp-main.222.
- BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186, Minneapolis, Minnesota, 2019. Association for Computational Linguistics. doi: 10.18653/v1/N19-1423. URL https://aclanthology.org/N19-1423.
- Delta tuning: A comprehensive study of parameter efficient methods for pre-trained language models. ArXiv preprint, abs/2203.06904, 2022. URL https://arxiv.org/abs/2203.06904.
- A survey on in-context learning. 2022. URL https://api.semanticscholar.org/CorpusID:255372865.
- Extending context window of large language models via semantic compression. ArXiv preprint, abs/2312.09571, 2023. URL https://arxiv.org/abs/2312.09571.
- Promptbreeder: Self-referential self-improvement via prompt evolution. ArXiv preprint, abs/2309.16797, 2023. URL https://arxiv.org/abs/2309.16797.
- An image is worth one word: Personalizing text-to-image generation using textual inversion. ArXiv preprint, abs/2208.01618, 2022. URL https://arxiv.org/abs/2208.01618.
- Extensible prompts for language models on zero-shot language style customization. 2022. URL https://api.semanticscholar.org/CorpusID:254125409.
- In-context autoencoder for context compression in a large language model. ArXiv preprint, abs/2307.06945, 2023. URL https://arxiv.org/abs/2307.06945.
- Connecting large language models with evolutionary algorithms yields powerful prompt optimizers. ArXiv preprint, abs/2309.08532, 2023. URL https://arxiv.org/abs/2309.08532.
- Optimizing prompts for text-to-image generation. ArXiv preprint, abs/2212.09611, 2022. URL https://arxiv.org/abs/2212.09611.
- In-context learning creates task vectors. ArXiv preprint, abs/2310.15916, 2023. URL https://arxiv.org/abs/2310.15916.
- Distilling the knowledge in a neural network. ArXiv preprint, abs/1503.02531, 2015. URL https://arxiv.org/abs/1503.02531.
- John H. Holland. Adaptation in natural and artificial systems: An introductory analysis with applications to biology, control, and artificial intelligence. 1992. URL https://api.semanticscholar.org/CorpusID:58781161.
- Parameter-efficient transfer learning for NLP. In Kamalika Chaudhuri and Ruslan Salakhutdinov (eds.), Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA, volume 97 of Proceedings of Machine Learning Research, pp. 2790–2799. PMLR, 2019. URL http://proceedings.mlr.press/v97/houlsby19a.html.
- Automatic engineering of long prompts. ArXiv preprint, abs/2311.10117, 2023. URL https://arxiv.org/abs/2311.10117.
- Llmlingua: Compressing prompts for accelerated inference of large language models. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pp. 13358–13376, 2023a.
- Longllmlingua: Accelerating and enhancing llms in long context scenarios via prompt compression. ArXiv preprint, abs/2310.06839, 2023b. URL https://arxiv.org/abs/2310.06839.
- Promptkd: Distilling student-friendly knowledge for generative language models via prompt tuning, 2024.
- The power of scale for parameter-efficient prompt tuning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 3045–3059, Online and Punta Cana, Dominican Republic, 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.emnlp-main.243. URL https://aclanthology.org/2021.emnlp-main.243.
- Ode transformer: An ordinary differential equation-inspired model for neural machine translation. ArXiv preprint, abs/2104.02308, 2021. URL https://arxiv.org/abs/2104.02308.
- Prefix-tuning: Optimizing continuous prompts for generation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 4582–4597, Online, 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.acl-long.353. URL https://aclanthology.org/2021.acl-long.353.
- Compressing context to enhance inference efficiency of large language models. In Conference on Empirical Methods in Natural Language Processing, 2023. URL https://api.semanticscholar.org/CorpusID:263830231.
- Use your instinct: Instruction optimization using neural bandits coupled with transformers. ArXiv preprint, abs/2310.02905, 2023. URL https://arxiv.org/abs/2310.02905.
- Tcra-llm: Token compression retrieval augmented large language model for inference cost reduction. In Conference on Empirical Methods in Natural Language Processing, 2023a. URL https://api.semanticscholar.org/CorpusID:264439519.
- Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys, 55(9):1–35, 2023b.
- P-tuning: Prompt tuning can be comparable to fine-tuning across scales and tasks. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp. 61–68, Dublin, Ireland, 2022. Association for Computational Linguistics. doi: 10.18653/v1/2022.acl-short.8. URL https://aclanthology.org/2022.acl-short.8.
- Learning to compress prompts with gist tokens. ArXiv preprint, abs/2304.08467, 2023. URL https://arxiv.org/abs/2304.08467.
- Llmlingua-2: Data distillation for efficient and faithful task-agnostic prompt compression. 2024. URL https://api.semanticscholar.org/CorpusID:268531237.
- Hypertuning: Toward adapting large language models without back-propagation. In International Conference on Machine Learning, 2022. URL https://api.semanticscholar.org/CorpusID:253761398.
- GrIPS: Gradient-free, edit-based instruction search for prompting large language models. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pp. 3845–3864, Dubrovnik, Croatia, 2023. Association for Computational Linguistics. URL https://aclanthology.org/2023.eacl-main.277.
- Measuring and narrowing the compositionality gap in language models. ArXiv preprint, abs/2210.03350, 2022. URL https://arxiv.org/abs/2210.03350.
- Automatic prompt optimization with ”gradient descent” and beam search. In Conference on Empirical Methods in Natural Language Processing, 2023. URL https://api.semanticscholar.org/CorpusID:258546785.
- Nugget: Neural agglomerative embeddings of text. ArXiv preprint, abs/2310.01732, 2023. URL https://arxiv.org/abs/2310.01732.
- Improving language understanding by generative pre-training. 2018.
- Efficient content-based sparse attention with routing transformers. Transactions of the Association for Computational Linguistics, 9:53–68, 2021. doi: 10.1162/tacl˙a˙00353. URL https://aclanthology.org/2021.tacl-1.4.
- Claude E. Shannon. A mathematical theory of communication. Bell Syst. Tech. J., 27:623–656, 1948. URL https://api.semanticscholar.org/CorpusID:55379485.
- Eliciting knowledge from language models using automatically generated prompts. ArXiv preprint, abs/2010.15980, 2020. URL https://arxiv.org/abs/2010.15980.
- Learning by distilling context. ArXiv preprint, abs/2209.15189, 2022. URL https://arxiv.org/abs/2209.15189.
- Differential evolution – a simple and efficient heuristic for global optimization over continuous spaces. Journal of Global Optimization, 11:341–359, 1997. URL https://api.semanticscholar.org/CorpusID:5297867.
- Roformer: Enhanced transformer with rotary position embedding. ArXiv preprint, abs/2104.09864, 2021. URL https://arxiv.org/abs/2104.09864.
- Llama: Open and efficient foundation language models. ArXiv preprint, abs/2302.13971, 2023. URL https://arxiv.org/abs/2302.13971.
- Attention is all you need. In Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett (eds.), Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, pp. 5998–6008, 2017. URL https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html.
- Efficient large language models: A survey. ArXiv preprint, abs/2312.03863, 2023. URL https://arxiv.org/abs/2312.03863.
- Label words are anchors: An information flow perspective for understanding in-context learning. ArXiv preprint, abs/2305.14160, 2023a. URL https://arxiv.org/abs/2305.14160.
- Plan-and-solve prompting: Improving zero-shot chain-of-thought reasoning by large language models. In Annual Meeting of the Association for Computational Linguistics, 2023b. URL https://api.semanticscholar.org/CorpusID:258558102.
- Self-consistency improves chain of thought reasoning in language models. ArXiv preprint, abs/2203.11171, 2022. URL https://arxiv.org/abs/2203.11171.
- Emergent abilities of large language models. ArXiv preprint, abs/2206.07682, 2022a. URL https://arxiv.org/abs/2206.07682.
- Chain of thought prompting elicits reasoning in large language models. ArXiv preprint, abs/2201.11903, 2022b. URL https://arxiv.org/abs/2201.11903.
- Hard prompts made easy: Gradient-based discrete optimization for prompt tuning and discovery. ArXiv preprint, abs/2302.03668, 2023. URL https://arxiv.org/abs/2302.03668.
- Prompt compression and contrastive conditioning for controllability and toxicity reduction in language models. In Findings of the Association for Computational Linguistics: EMNLP 2022, pp. 5621–5634, Abu Dhabi, United Arab Emirates, 2022. Association for Computational Linguistics. URL https://aclanthology.org/2022.findings-emnlp.412.
- Introduction to transformers: an nlp perspective. ArXiv preprint, abs/2311.17633, 2023. URL https://arxiv.org/abs/2311.17633.
- Large language models as optimizers. ArXiv preprint, abs/2309.03409, 2023. URL https://arxiv.org/abs/2309.03409.
- Dialclip: Empowering clip as multi-modal dialog retriever. ArXiv preprint, abs/2401.01076, 2024. URL https://arxiv.org/abs/2401.01076.
- Least-to-most prompting enables complex reasoning in large language models. ArXiv preprint, abs/2205.10625, 2022a. URL https://arxiv.org/abs/2205.10625.
- Large language models are human-level prompt engineers. ArXiv preprint, abs/2211.01910, 2022b. URL https://arxiv.org/abs/2211.01910.