A Systematic Survey of Prompt Engineering in Large Language Models: Techniques and Applications (2402.07927v2)
Abstract: Prompt engineering has emerged as an indispensable technique for extending the capabilities of LLMs and vision-LLMs (VLMs). This approach leverages task-specific instructions, known as prompts, to enhance model efficacy without modifying the core model parameters. Rather than updating the model parameters, prompts allow seamless integration of pre-trained models into downstream tasks by eliciting desired model behaviors solely based on the given prompt. Prompts can be natural language instructions that provide context to guide the model or learned vector representations that activate relevant knowledge. This burgeoning field has enabled success across various applications, from question-answering to commonsense reasoning. However, there remains a lack of systematic organization and understanding of the diverse prompt engineering methods and techniques. This survey paper addresses the gap by providing a structured overview of recent advancements in prompt engineering, categorized by application area. For each prompting approach, we provide a summary detailing the prompting methodology, its applications, the models involved, and the datasets utilized. We also delve into the strengths and limitations of each approach and include a taxonomy diagram and table summarizing datasets, models, and critical points of each prompting technique. This systematic analysis enables a better understanding of this rapidly developing field and facilitates future research by illuminating open challenges and opportunities for prompt engineering.
- Exploring visual prompts for adapting large-scale models. arXiv preprint arXiv:2203.17274, 2022.
- Language models are few-shot learners, 2020.
- Program of thoughts prompting: Disentangling computation from reasoning for numerical reasoning tasks. arXiv preprint arXiv:2211.12588, 2022.
- Unleashing the potential of prompt engineering in large language models: a comprehensive review. arXiv preprint arXiv:2310.14735, 2023.
- Contrastive chain-of-thought prompting. arXiv preprint arXiv:2311.09277, 2023.
- Rephrase and respond: Let large language models ask better questions for themselves. arXiv preprint arXiv:2311.04205, 2023.
- Chain-of-verification reduces hallucination in large language models. arXiv preprint arXiv:2309.11495, 2023.
- Active prompting with chain-of-thought for large language models. arXiv preprint arXiv:2302.12246, 2023.
- Chain-of-symbol prompting elicits planning in large langauge models, 2023.
- Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems, 33:9459–9474, 2020.
- Large language models understand and can be enhanced by emotional stimuli. arXiv preprint arXiv:2307.11760, 2023.
- Chain of code: Reasoning with a language model-augmented code emulator. arXiv preprint arXiv:2312.04474, 2023.
- Structured chain-of-thought prompting for code generation. arXiv preprint arXiv:2305.06599, 2023.
- Chain-of-knowledge: Grounding large language models via dynamic knowledge adapting over heterogeneous sources, 2023.
- Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys, 55(9):1–35, 2023.
- Jieyi Long. Large language model guided tree-of-thought. arXiv preprint arXiv:2305.08291, 2023.
- Show your work: Scratchpads for intermediate computation with language models. arXiv preprint arXiv:2112.00114, 2021.
- Art: Automatic multi-step reasoning and tool-use for large language models. arXiv preprint arXiv:2303.09014, 2023.
- Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
- A comprehensive survey of hallucination mitigation techniques in large language models. arXiv preprint arXiv:2401.01313, 2024.
- Self-consistency improves chain of thought reasoning in language models. arXiv preprint arXiv:2203.11171, 2022.
- Chain-of-table: Evolving tables in the reasoning chain for table understanding, 2024.
- Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837, 2022.
- System 2 attention (is something you might need too). arXiv preprint arXiv:2311.11829, 2023.
- Visual chatgpt: Talking, drawing and editing with visual foundation models, 2023.
- Large language models as optimizers. arXiv preprint arXiv:2309.03409, 2023.
- React: Synergizing reasoning and acting in language models. arXiv preprint arXiv:2210.03629, 2022.
- Tree of thoughts: Deliberate problem solving with large language models. arXiv preprint arXiv:2305.10601, 2023.
- Beyond chain-of-thought, effective graph-of-thought reasoning in large language models. arXiv preprint arXiv:2305.16582, 2023.
- Chain-of-note: Enhancing robustness in retrieval-augmented language models, 2023.
- Automatic chain of thought prompting in large language models. arXiv preprint arXiv:2210.03493, 2022.
- Enhancing zero-shot chain-of-thought reasoning in large language models through logic, 2023.
- Take a step back: evoking reasoning via abstraction in large language models. arXiv preprint arXiv:2310.06117, 2023.
- Large language models are human-level prompt engineers. arXiv preprint arXiv:2211.01910, 2022.
- Thread of thought unraveling chaotic contexts. arXiv preprint arXiv:2311.08734, 2023.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Explain it Like I'm 14
Overview
This paper is about “prompt engineering,” which is the art of writing good instructions for AI LLMs (like ChatGPT) and vision-LLMs (AIs that understand both pictures and text). The main idea is that, instead of retraining a huge AI every time you want it to do a new task, you can guide it with smart prompts so it behaves how you want.
Think of a LLM as a very smart student who has read a lot but can be a bit literal. A prompt is like the way you ask the student a question and give directions—clear instructions can make the student do better without changing how the student thinks inside.
Goals and Research Questions
The paper aims to:
- Organize and explain the many different prompt engineering techniques people have invented.
- Show where and how these techniques are used (for tasks like reasoning, answering questions, writing code, and reducing mistakes).
- Compare their strengths and weaknesses and the kinds of models and datasets they use.
- Present a handy “map” (taxonomy) and a summary table so researchers and users can choose the right prompting methods for their needs.
- Point out open problems and future directions for prompt engineering.
How Did They Do It? (Methods)
This is a survey paper, which means the authors did not run just one big experiment. Instead, they:
- Collected and read many research papers on 29 different prompting techniques.
- Grouped these techniques by what they help with (such as reasoning, handling emotions, using tools, or writing code).
- Summarized how each technique works in everyday language, the tasks it’s used for, the models involved (like GPT-3, GPT-4, Llama, T5), and the datasets used to test them (benchmarks where models are scored).
- Discussed pros and cons of each technique.
- Built a taxonomy (a structured guide) and a table that make it easy to compare methods.
If “taxonomy” sounds fancy, think of it like a well-labeled closet: each shelf is a category (reasoning, reducing mistakes, etc.), and each item (technique) is placed where it fits best.
Main Findings and Why They Matter
The big takeaway is that prompt engineering can dramatically improve what AI models can do—often without retraining them. Here are a few standout techniques, introduced with a sentence before the list to improve readability:
- Zero-shot and few-shot prompting:
- Zero-shot means you just give clear instructions, no examples. It’s like telling a student, “Do this new type of math problem,” and they try using what they already know.
- Few-shot means you include a few example Q&As in the prompt. Even a small number of examples can help models do much better, especially on tricky tasks.
- Chain-of-Thought (CoT): You ask the model to “show its steps” like solving a math problem line by line. This often boosts accuracy in reasoning tasks.
- Auto-CoT and Self-Consistency:
- Auto-CoT automatically generates examples of reasoning steps so you don’t have to handwrite them.
- Self-consistency has the model try multiple solution paths and then pick the most common correct answer—like taking a vote among several attempts.
- Tree-of-Thoughts (ToT) and Graph-of-Thoughts (GoT):
- These go beyond simple step-by-step thinking. The model explores different branches or networks of ideas, then chooses the best path forward—similar to brainstorming multiple routes to a solution and picking the best one.
- RAG (Retrieval-Augmented Generation): The model “looks things up” in a knowledge base and uses that information in its answer. This helps it be more accurate and current, reducing made-up facts (hallucinations).
- ReAct and ART (Reason + Action/Tools): The model both thinks and takes actions (like calling tools, searching, or using a calculator) in a loop. This helps it handle complex, real-world tasks better.
- CoVe (Chain-of-Verification): The model plans questions to check its own work, answers those questions, and then revises its final answer—like a self-checking worksheet.
- EmotionPrompt: Adding short motivational or emotional cues can surprisingly help the model respond better on some tasks—like giving the AI a quick pep talk.
- PoT and CoC (Program-of-Thoughts and Chain-of-Code): The model writes code or pseudocode to think precisely through numerical or logical problems, cutting down on errors.
- SCoT (Structured Chain-of-Thought): For code generation, the model follows clear program structures (sequence, loops, branches), which leads to more accurate code than plain natural-language reasoning.
- APE (Automatic Prompt Engineer) and OPRO (Optimization by Prompting):
- APE automatically creates and selects effective instructions.
- OPRO uses natural language prompts to improve solutions over iterations, acting like an optimizer—not just answering questions, but improving strategies to get better results.
Across many benchmarks (like GSM8K for math word problems, TriviaQA for question answering, HumanEval/MBPP for coding), these techniques often show strong improvements. Examples include big jumps in success rates when using ToT for puzzles, better factual accuracy with RAG, and clearer reasoning with CoT, Auto-CoT, and self-consistency.
Implications and Impact
- For everyday users: Better prompts mean better answers—from clearer explanations and fewer mistakes to smarter problem-solving. You don’t need to retrain the model; you can just ask better.
- For developers and researchers: This survey acts like a guidebook—helping you pick the right technique for your task (reasoning, coding, using tools, reducing hallucinations, etc.) and understand trade-offs.
- For the future of AI: Prompt engineering opens doors to flexible, powerful systems that can reason, check themselves, and use tools. The paper also highlights challenges such as bias, hallucinations, and understanding how models think. It points to promising directions like combining multiple techniques (hybrid prompts), meta-learning, and more careful, ethical use.
In short, prompt engineering is a practical, powerful way to get more from AI models today—and this paper shows how the field is growing fast, where it’s working well, and where we need to improve next.
Collections
Sign up for free to add this paper to one or more collections.