Unleashing the Potential of Large Language Models as Prompt Optimizers: Analogical Analysis with Gradient-based Model Optimizers (2402.17564v3)
Abstract: Automatic prompt optimization is an important approach to improving the performance of LLMs. Recent research demonstrates the potential of using LLMs as prompt optimizers, which can generate improved task prompts via iterative refinement. In this paper, we propose a novel perspective to investigate the design of LLM-based prompt optimizers, by drawing an analogy with gradient-based model optimizers. To connect these two approaches, we identify two pivotal factors in model parameter learning: update direction and update method. By systematically analyzing a rich set of improvement strategies on the two aspects, we further develop a capable Gradient-inspired LLM-based Prompt Optimizer called GPO. At each step, it first retrieves relevant prompts from the optimization trajectory as the update direction. Then, it utilizes the generation-based refinement strategy to perform the update, while controlling the edit distance through a cosine-based decay strategy. Extensive experiments demonstrate the effectiveness and efficiency of GPO. In particular, GPO brings an additional improvement of up to 56.8% on Big-Bench Hard and 62.6% on MMLU compared to baseline methods. The code is available at https://github.com/RUCAIBox/GPO.
- Survey of optimization algorithms in modern neural networks. Mathematics, 11(11):2466.
- Xavier Amatriain. 2024. Prompt design and engineering: Introduction and advanced methods. CoRR, abs/2401.14423.
- Stephen P. Boyd and Lieven Vandenberghe. 2014. Convex Optimization. Cambridge University Press.
- Instructzero: Efficient instruction optimization for black-box large language models. CoRR, abs/2306.03082.
- Training verifiers to solve math word problems. CoRR, abs/2110.14168.
- Agent instructs large language models to be general zero-shot reasoners. CoRR, abs/2310.03710.
- Rlprompt: Optimizing discrete text prompts with reinforcement learning. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022, pages 3369–3391. Association for Computational Linguistics.
- Promptbreeder: Self-referential self-improvement via prompt evolution. CoRR, abs/2309.16797.
- Creating training corpora for NLG micro-planners. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, ACL 2017, Vancouver, Canada, July 30 - August 4, Volume 1: Long Papers, pages 179–188. Association for Computational Linguistics.
- A closer look at deep learning heuristics: Learning rate restarts, warmup and distillation. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net.
- Accurate, large minibatch SGD: training imagenet in 1 hour. CoRR, abs/1706.02677.
- Connecting large language models with evolutionary algorithms yields powerful prompt optimizers. CoRR, abs/2309.08532.
- Measuring massive multitask language understanding. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net.
- Large language models cannot self-correct reasoning yet. CoRR, abs/2310.01798.
- Averaging weights leads to wider optima and better generalization. In Proceedings of the Thirty-Fourth Conference on Uncertainty in Artificial Intelligence, UAI 2018, Monterey, California, USA, August 6-10, 2018, pages 876–885. AUAI Press.
- Large language models are zero-shot reasoners. In Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022.
- The power of scale for parameter-efficient prompt tuning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 7-11 November, 2021, pages 3045–3059. Association for Computational Linguistics.
- The winograd schema challenge. In Principles of Knowledge Representation and Reasoning: Proceedings of the Thirteenth International Conference, KR 2012, Rome, Italy, June 10-14, 2012. AAAI Press.
- Xiang Lisa Li and Percy Liang. 2021. Prefix-tuning: Optimizing continuous prompts for generation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL/IJCNLP 2021, (Volume 1: Long Papers), Virtual Event, August 1-6, 2021, pages 4582–4597. Association for Computational Linguistics.
- Chin-Yew Lin. 2004. ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out, pages 74–81, Barcelona, Spain. Association for Computational Linguistics.
- Use your INSTINCT: instruction optimization using neural bandits coupled with transformers. CoRR, abs/2310.02905.
- Fantastically ordered prompts and where to find them: Overcoming few-shot prompt order sensitivity. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2022, Dublin, Ireland, May 22-27, 2022, pages 8086–8098. Association for Computational Linguistics.
- Are large language models good prompt optimizers? CoRR, abs/2402.02101.
- OpenAI. 2023. GPT-4 technical report. CoRR, abs/2303.08774.
- Grips: Gradient-free, edit-based instruction search for prompting large language models. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2023, Dubrovnik, Croatia, May 2-6, 2023, pages 3827–3846. Association for Computational Linguistics.
- Automatic prompt optimization with "gradient descent" and beam search. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023, Singapore, December 6-10, 2023, pages 7957–7968. Association for Computational Linguistics.
- Autoprompt: Eliciting knowledge from language models with automatically generated prompts. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online, November 16-20, 2020, pages 4222–4235. Association for Computational Linguistics.
- Beyond the imitation game: Quantifying and extrapolating the capabilities of language models. CoRR, abs/2206.04615.
- A survey of optimization methods from a machine learning perspective. IEEE Trans. Cybern., 50(8):3668–3681.
- On the importance of initialization and momentum in deep learning. In Proceedings of the 30th International Conference on Machine Learning, ICML 2013, Atlanta, GA, USA, 16-21 June 2013, volume 28 of JMLR Workshop and Conference Proceedings, pages 1139–1147. JMLR.org.
- Challenging big-bench tasks and whether chain-of-thought can solve them. In Findings of the Association for Computational Linguistics: ACL 2023, Toronto, Canada, July 9-14, 2023, pages 13003–13051. Association for Computational Linguistics.
- Llama 2: Open foundation and fine-tuned chat models. CoRR, abs/2307.09288.
- Super-naturalinstructions: Generalization via declarative instructions on 1600+ NLP tasks. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022, pages 5085–5109. Association for Computational Linguistics.
- Larger language models do in-context learning differently. CoRR, abs/2303.03846.
- C-pack: Packaged resources to advance general chinese embedding. CoRR, abs/2309.07597.
- GPS: genetic prompt search for efficient few-shot learning. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022, pages 8162–8171. Association for Computational Linguistics.
- Large language models as optimizers. CoRR, abs/2309.03409.
- Instoptima: Evolutionary multi-objective instruction optimization via large language model-based instruction operators. In Findings of the Association for Computational Linguistics: EMNLP 2023, Singapore, December 6-10, 2023, pages 13593–13602. Association for Computational Linguistics.
- Prompt engineering a prompt engineer. CoRR, abs/2311.05661.
- TEMPERA: test-time prompt editing via reinforcement learning. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net.
- A survey of large language models. CoRR, abs/2303.18223.
- Calibrate before use: Improving few-shot performance of language models. In Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event, volume 139 of Proceedings of Machine Learning Research, pages 12697–12706. PMLR.
- Large language models are human-level prompt engineers. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net.