Unleashing the Potential of Large Language Models as Prompt Optimizers: Analogical Analysis with Gradient-based Model Optimizers (2402.17564v3)

Published 27 Feb 2024 in cs.CL

Abstract: Automatic prompt optimization is an important approach to improving the performance of LLMs. Recent research demonstrates the potential of using LLMs as prompt optimizers, which can generate improved task prompts via iterative refinement. In this paper, we propose a novel perspective to investigate the design of LLM-based prompt optimizers, by drawing an analogy with gradient-based model optimizers. To connect these two approaches, we identify two pivotal factors in model parameter learning: update direction and update method. By systematically analyzing a rich set of improvement strategies on the two aspects, we further develop a capable Gradient-inspired LLM-based Prompt Optimizer called GPO. At each step, it first retrieves relevant prompts from the optimization trajectory as the update direction. Then, it utilizes the generation-based refinement strategy to perform the update, while controlling the edit distance through a cosine-based decay strategy. Extensive experiments demonstrate the effectiveness and efficiency of GPO. In particular, GPO brings an additional improvement of up to 56.8% on Big-Bench Hard and 62.6% on MMLU compared to baseline methods. The code is available at https://github.com/RUCAIBox/GPO.

References (43)

Citations (8)

View on Semantic Scholar

Summary

The paper presents GPO, a gradient-inspired prompt optimizer that improves LLM prompt tuning by drawing analogies with gradient-based update strategies.
It formulates prompt optimization as maximizing task performance using iterative candidate generation and cosine-based decay to control prompt variations.
Experiments demonstrate that GPO outperforms baseline methods across diverse tasks, achieving higher accuracy and greater token efficiency.

The paper "Unleashing the Potential of LLMs as Prompt Optimizers: An Analogical Analysis with Gradient-based Model Optimizers" introduces a novel perspective on designing LLM-based prompt optimizers by drawing an analogy with gradient-based model optimizers. The paper identifies two pivotal factors in model parameter learning: update direction and update method and then borrows theoretical frameworks and learning methods from gradient-based optimization to design improved strategies for LLM-based prompt optimizers. The authors develop a Gradient-inspired LLM-based Prompt Optimizer called GPO and demonstrate its effectiveness and efficiency through experiments.

Here's a more detailed breakdown:

Introduction

The paper addresses the challenge of prompt engineering for LLMs, which is difficult because LLMs are sensitive to prompts. Automatic prompt optimization has been proposed to improve the task performance of LLMs. Recent work models the optimization problem in natural language and uses LLMs as prompt optimizers. The paper aims to investigate the design of meta-prompts. The authors are inspired by the success of gradient-based optimizers in model optimization and aim to connect the two approaches via analogical analysis.

Analogical Analysis

The authors draw inspiration from gradient-based model optimizers to conduct a systematic analysis of LLM-based prompt optimizers. The key idea is to draw connections between model optimization and prompt optimization to improve existing LLM-based prompt optimizers.

Task Formulation: The paper defines the prompt optimization problem as finding the optimal task prompt $p^*$ that maximizes performance on a task dataset $\mathcal{D}$ using an LLM as the task model $\mathcal{M}_T$ . This optimization is performed by an LLM-based prompt optimizer $\mathcal{M}_O$ , which requires a meta-prompt to guide the optimization process. The problem is formulated as:

$p^* = \mathop{\arg\max} \limits_{p \sim \mathcal{M}_O} \ \mathbb{E}_{\langle x,y \rangle \in \mathcal{D} \ [F(\mathcal{M}_T(x;p), y)]$,

where:

$p$ is the prompt generated by the LLM-based prompt optimizer $\mathcal{M}_O$
$\mathcal{M}_T(x; p)$ represents the output from the task model for input $x$ conditioned on the prompt $p$
$F(\cdot)$ calculates the task performance based on some measurement.

Analogical Prompt Optimization Strategies: The paper identifies two key factors: update direction and update method.

Update Direction:
- Analogical "Gradient" Forms: The paper considers two forms to implicitly support the gradient-like function:
- Prompt+performance: Including the last-round task prompt and the corresponding model performance into the meta-prompt.
- Prompt+performance+reflection: Leveraging the reflection capability of LLMs.
- Analogical "Momentum" Forms: The paper considers enhancing the basic form of meta-prompt by leveraging the intermediate results accumulated in the prompt optimization process:
- Summarization-based trajectory: Summarizing the intermediate results from the optimization trajectory.
- Retrieval-based trajectory: Dynamically retrieving $k$ pieces of gradients from the optimization trajectory.
- Recency: selecting $k$ nearest gradients
- Relevance: selecting $k$ most relevant gradients
- Importance: selecting $k$ most important gradients
Update Method:
- Prompt Variation Control: The paper controls the variation degree of prompt optimization, which is measured by the edit distance between two task prompts at consecutive iterations.
- Decay-based constraint: Gradually reducing the maximum edit distance.
- Warmup-based constraint: Gradually increasing the constraint for the maximum edit distance to its initially set value in the initial 5% steps.
- Prompt Refinement Strategy: The paper introduces two methods to update the task prompt:
- Editing-based refinement: Directly editing the last-round task prompt to improve performance.
- Generation-based refinement: Leveraging the in-context learning capability of LLMs to generate refined task prompts.

Analogical Analysis Experiments: The paper conducts experiments to analyze the effectiveness of different strategies for update direction and update method. A dataset is selected from each type of task in Big-Bench Hard (BBH) to create a lite BBH benchmark for the analysis: i) Navigate (binary choice); ii) Movie Recommendation (multiple choice); iii) Object Counting (numeric response); iv) Word Sorting (free response). Llama-2-7b-chat is employed as the task model and gpt-3.5-turbo as the prompt optimizer.

GPO: Gradient-inspired LLM-based Prompt Optimizer

The authors present a novel gradient-inspired LLM-based prompt optimizer called GPO. GPO performs prompt optimization through a multi-step iterative process. At each step, the LLM first generates multiple candidate task prompts based on a meta-prompt and then the task prompt with the best performance is selected for the next iteration. The meta-prompt consists of two key components: update direction and update method. For the update direction, the approach leverages the retrieval-based optimization trajectory. For the update method, the approach employs the generation-based refinement strategy and also implements the cosine-based decay strategy to control the edit distance between task prompts at consecutive iterations.

Experiments

The paper sets up experiments to evaluate the performance of GPO across various tasks and evaluation settings.

Experimental Setup: The paper selects datasets from three groups of tasks: Big-Bench Hard (BBH) and GSM8K for complex reasoning tasks, MMLU for knowledge-intensive tasks, and WSC and WebNLG for common NLP tasks. Several representative methods are selected for comparison, including existing LLM-based prompt optimizers and one adapted from gradient-based model optimizers: (1) SGDM, (2) APE, (3) APO, (4) OPRO, (5) PE2. The evaluation metrics include the average accuracy of all the subtasks for BBH and MMLU, accuracy for GSM8K, and ROUGE-L for WSC and WebNLG.

Main Results: The results show that GPO achieves the best performance across all tasks. Under various evaluation settings for the lite BBH benchmark, GPO not only excels in the "Instruction" setting but also yields gains in the "Instruction + Demonstration" setting for both the base model and the instruction-tuned variant.

Detailed Analysis: The paper conducts a detailed analysis of GPO from the following aspects: the impact of model selection, the efficiency of optimization, the impact of initial prompts, and the generalizability of optimized prompts.

Related Work

The work is related to prompt engineering and optimization and LLM-based prompt optimizers.

Conclusion

The paper presents GPO, a novel gradient-inspired LLM-based prompt optimizer. It utilizes LLMs to automatically optimize prompts, drawing inspiration from gradient-based model optimization techniques. Through extensive experiments, GPO demonstrates remarkable capabilities for prompt optimization across diverse tasks, models, and evaluation settings and surpasses competitive baselines while consuming fewer tokens.

PDF Markdown

Unleashing the Potential of Large Language Models as Prompt Optimizers: Analogical Analysis with Gradient-based Model Optimizers (2402.17564v3)

Summary

Related Papers