Papers
Topics
Authors
Recent
2000 character limit reached

A Systematic Survey of Prompt Engineering in Large Language Models: Techniques and Applications (2402.07927v2)

Published 5 Feb 2024 in cs.AI, cs.CL, and cs.HC

Abstract: Prompt engineering has emerged as an indispensable technique for extending the capabilities of LLMs and vision-LLMs (VLMs). This approach leverages task-specific instructions, known as prompts, to enhance model efficacy without modifying the core model parameters. Rather than updating the model parameters, prompts allow seamless integration of pre-trained models into downstream tasks by eliciting desired model behaviors solely based on the given prompt. Prompts can be natural language instructions that provide context to guide the model or learned vector representations that activate relevant knowledge. This burgeoning field has enabled success across various applications, from question-answering to commonsense reasoning. However, there remains a lack of systematic organization and understanding of the diverse prompt engineering methods and techniques. This survey paper addresses the gap by providing a structured overview of recent advancements in prompt engineering, categorized by application area. For each prompting approach, we provide a summary detailing the prompting methodology, its applications, the models involved, and the datasets utilized. We also delve into the strengths and limitations of each approach and include a taxonomy diagram and table summarizing datasets, models, and critical points of each prompting technique. This systematic analysis enables a better understanding of this rapidly developing field and facilitates future research by illuminating open challenges and opportunities for prompt engineering.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (35)
  1. Exploring visual prompts for adapting large-scale models. arXiv preprint arXiv:2203.17274, 2022.
  2. Language models are few-shot learners, 2020.
  3. Program of thoughts prompting: Disentangling computation from reasoning for numerical reasoning tasks. arXiv preprint arXiv:2211.12588, 2022.
  4. Unleashing the potential of prompt engineering in large language models: a comprehensive review. arXiv preprint arXiv:2310.14735, 2023.
  5. Contrastive chain-of-thought prompting. arXiv preprint arXiv:2311.09277, 2023.
  6. Rephrase and respond: Let large language models ask better questions for themselves. arXiv preprint arXiv:2311.04205, 2023.
  7. Chain-of-verification reduces hallucination in large language models. arXiv preprint arXiv:2309.11495, 2023.
  8. Active prompting with chain-of-thought for large language models. arXiv preprint arXiv:2302.12246, 2023.
  9. Chain-of-symbol prompting elicits planning in large langauge models, 2023.
  10. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems, 33:9459–9474, 2020.
  11. Large language models understand and can be enhanced by emotional stimuli. arXiv preprint arXiv:2307.11760, 2023.
  12. Chain of code: Reasoning with a language model-augmented code emulator. arXiv preprint arXiv:2312.04474, 2023.
  13. Structured chain-of-thought prompting for code generation. arXiv preprint arXiv:2305.06599, 2023.
  14. Chain-of-knowledge: Grounding large language models via dynamic knowledge adapting over heterogeneous sources, 2023.
  15. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys, 55(9):1–35, 2023.
  16. Jieyi Long. Large language model guided tree-of-thought. arXiv preprint arXiv:2305.08291, 2023.
  17. Show your work: Scratchpads for intermediate computation with language models. arXiv preprint arXiv:2112.00114, 2021.
  18. Art: Automatic multi-step reasoning and tool-use for large language models. arXiv preprint arXiv:2303.09014, 2023.
  19. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
  20. A comprehensive survey of hallucination mitigation techniques in large language models. arXiv preprint arXiv:2401.01313, 2024.
  21. Self-consistency improves chain of thought reasoning in language models. arXiv preprint arXiv:2203.11171, 2022.
  22. Chain-of-table: Evolving tables in the reasoning chain for table understanding, 2024.
  23. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837, 2022.
  24. System 2 attention (is something you might need too). arXiv preprint arXiv:2311.11829, 2023.
  25. Visual chatgpt: Talking, drawing and editing with visual foundation models, 2023.
  26. Large language models as optimizers. arXiv preprint arXiv:2309.03409, 2023.
  27. React: Synergizing reasoning and acting in language models. arXiv preprint arXiv:2210.03629, 2022.
  28. Tree of thoughts: Deliberate problem solving with large language models. arXiv preprint arXiv:2305.10601, 2023.
  29. Beyond chain-of-thought, effective graph-of-thought reasoning in large language models. arXiv preprint arXiv:2305.16582, 2023.
  30. Chain-of-note: Enhancing robustness in retrieval-augmented language models, 2023.
  31. Automatic chain of thought prompting in large language models. arXiv preprint arXiv:2210.03493, 2022.
  32. Enhancing zero-shot chain-of-thought reasoning in large language models through logic, 2023.
  33. Take a step back: evoking reasoning via abstraction in large language models. arXiv preprint arXiv:2310.06117, 2023.
  34. Large language models are human-level prompt engineers. arXiv preprint arXiv:2211.01910, 2022.
  35. Thread of thought unraveling chaotic contexts. arXiv preprint arXiv:2311.08734, 2023.
Citations (133)

Summary

  • The paper presents a comprehensive taxonomy of prompt engineering techniques that extend LLM capabilities without retraining underlying models.
  • It details advanced methods like chain-of-thought prompting and retrieval augmented generation to enhance reasoning and reduce hallucinations.
  • It outlines future directions, including meta-learning approaches and ethical considerations, to guide next-generation prompt engineering research.

A Systematic Survey of Prompt Engineering in LLMs: Techniques and Applications

Introduction

Prompt engineering presents itself as a seminal technique in the enhancement and adaptation of pre-trained LLMs and vision-LLMs (VLMs) across a broad spectrum of tasks without modifying their underlying parameters. This paper compiles a comprehensive survey of the field, delineating the evolution, methodologies, and applications of prompt engineering, stretching from zero-shot and few-shot learning to complex reasoning methodologies. This detailed overview addresses the pressing need for a systematic organization within this burgeoning field, aiming to pave the way for future research by identifying both current strengths and potential gaps.

Overview of Prompt Engineering Techniques

The taxonomy introduced in this survey categorizes prompt engineering techniques into several critical areas, emphasizing their application domains and contributions to advancing LLM capabilities.

New Tasks Without Extensive Training

  • Zero-shot prompting, leveraging the innate knowledge of models to tackle tasks without specific training, and Few-shot prompting, which requires minimal examples, signify foundational techniques that extend model applicability without significant data or computational overheads.

Reasoning and Logic

  • Advanced techniques like Chain-of-Thought (CoT) prompting showcase how models can be guided to generate step-by-step reasoning, replicating a thought process for complex problem-solving tasks. Innovations like Automatic Chain-of-Thought (Auto-CoT) and Self-Consistency further refine this idea, introducing automation and variety in reasoning generation to enhance performance and reliability.

Reducing Hallucination

  • Techniques such as Retrieval Augmented Generation (RAG) and Chain-of-Verification (CoVe) aim to curb the issue of hallucination in outputs by augmenting prompts with retrieval mechanisms and verification steps, leading to more accurate and reliable responses.

Code Generation and Execution

  • The survey explores the domain of code generation, highlighting approaches like Scratchpad Prompting, Structured Chain-of-Thought (SCoT), and Chain-of-Code (CoC), which elucidate methods to enhance the precision and logical flow in code-related tasks.

Managing Emotions and Tone

  • A novel area explored is the management of emotions and tone through prompting, illustrating the method's versatility beyond technical applications to include human-like understanding and generation of content.

Theoretical and Practical Implications

This survey not only underscores the practical successes achieved across various tasks, including language generation, reasoning, and code execution, but also explores the theoretical understanding of how different prompt engineering methodologies can influence model behavior. It highlights the dual benefit of prompt engineering: enhancing model performance while providing insights into model cognition and decision-making processes.

Future Directions in Prompt Engineering

The comprehensive analysis in this survey pinpoints several future directions, including the exploration of meta-learning approaches for prompt optimization, the development of hybrid models combining different prompting techniques, and the imperative need for addressing ethical concerns. The ongoing efforts to mitigate biases, enhance factual accuracy, and improve the interpretability of models through advanced prompt engineering methodologies are critical focal points for future research.

Conclusion

In conclusion, this survey not only serves as a critical resource for researchers and practitioners exploring the field of prompt engineering but also lays down a roadmap for future investigations. By systematically organizing and analyzing the plethora of existing techniques and their applications, it brings to light the immense potential of prompt engineering in harnessing the capabilities of LLMs and VLMs, advocating for a balanced approach towards its development, with an eye toward ethical and responsible AI use.

Whiteboard

Paper to Video (Beta)

Explain it Like I'm 14

Overview

This paper is about “prompt engineering,” which is the art of writing good instructions for AI LLMs (like ChatGPT) and vision-LLMs (AIs that understand both pictures and text). The main idea is that, instead of retraining a huge AI every time you want it to do a new task, you can guide it with smart prompts so it behaves how you want.

Think of a LLM as a very smart student who has read a lot but can be a bit literal. A prompt is like the way you ask the student a question and give directions—clear instructions can make the student do better without changing how the student thinks inside.

Goals and Research Questions

The paper aims to:

  • Organize and explain the many different prompt engineering techniques people have invented.
  • Show where and how these techniques are used (for tasks like reasoning, answering questions, writing code, and reducing mistakes).
  • Compare their strengths and weaknesses and the kinds of models and datasets they use.
  • Present a handy “map” (taxonomy) and a summary table so researchers and users can choose the right prompting methods for their needs.
  • Point out open problems and future directions for prompt engineering.

How Did They Do It? (Methods)

This is a survey paper, which means the authors did not run just one big experiment. Instead, they:

  • Collected and read many research papers on 29 different prompting techniques.
  • Grouped these techniques by what they help with (such as reasoning, handling emotions, using tools, or writing code).
  • Summarized how each technique works in everyday language, the tasks it’s used for, the models involved (like GPT-3, GPT-4, Llama, T5), and the datasets used to test them (benchmarks where models are scored).
  • Discussed pros and cons of each technique.
  • Built a taxonomy (a structured guide) and a table that make it easy to compare methods.

If “taxonomy” sounds fancy, think of it like a well-labeled closet: each shelf is a category (reasoning, reducing mistakes, etc.), and each item (technique) is placed where it fits best.

Main Findings and Why They Matter

The big takeaway is that prompt engineering can dramatically improve what AI models can do—often without retraining them. Here are a few standout techniques, introduced with a sentence before the list to improve readability:

  • Zero-shot and few-shot prompting:
    • Zero-shot means you just give clear instructions, no examples. It’s like telling a student, “Do this new type of math problem,” and they try using what they already know.
    • Few-shot means you include a few example Q&As in the prompt. Even a small number of examples can help models do much better, especially on tricky tasks.
  • Chain-of-Thought (CoT): You ask the model to “show its steps” like solving a math problem line by line. This often boosts accuracy in reasoning tasks.
  • Auto-CoT and Self-Consistency:
    • Auto-CoT automatically generates examples of reasoning steps so you don’t have to handwrite them.
    • Self-consistency has the model try multiple solution paths and then pick the most common correct answer—like taking a vote among several attempts.
  • Tree-of-Thoughts (ToT) and Graph-of-Thoughts (GoT):
    • These go beyond simple step-by-step thinking. The model explores different branches or networks of ideas, then chooses the best path forward—similar to brainstorming multiple routes to a solution and picking the best one.
  • RAG (Retrieval-Augmented Generation): The model “looks things up” in a knowledge base and uses that information in its answer. This helps it be more accurate and current, reducing made-up facts (hallucinations).
  • ReAct and ART (Reason + Action/Tools): The model both thinks and takes actions (like calling tools, searching, or using a calculator) in a loop. This helps it handle complex, real-world tasks better.
  • CoVe (Chain-of-Verification): The model plans questions to check its own work, answers those questions, and then revises its final answer—like a self-checking worksheet.
  • EmotionPrompt: Adding short motivational or emotional cues can surprisingly help the model respond better on some tasks—like giving the AI a quick pep talk.
  • PoT and CoC (Program-of-Thoughts and Chain-of-Code): The model writes code or pseudocode to think precisely through numerical or logical problems, cutting down on errors.
  • SCoT (Structured Chain-of-Thought): For code generation, the model follows clear program structures (sequence, loops, branches), which leads to more accurate code than plain natural-language reasoning.
  • APE (Automatic Prompt Engineer) and OPRO (Optimization by Prompting):
    • APE automatically creates and selects effective instructions.
    • OPRO uses natural language prompts to improve solutions over iterations, acting like an optimizer—not just answering questions, but improving strategies to get better results.

Across many benchmarks (like GSM8K for math word problems, TriviaQA for question answering, HumanEval/MBPP for coding), these techniques often show strong improvements. Examples include big jumps in success rates when using ToT for puzzles, better factual accuracy with RAG, and clearer reasoning with CoT, Auto-CoT, and self-consistency.

Implications and Impact

  • For everyday users: Better prompts mean better answers—from clearer explanations and fewer mistakes to smarter problem-solving. You don’t need to retrain the model; you can just ask better.
  • For developers and researchers: This survey acts like a guidebook—helping you pick the right technique for your task (reasoning, coding, using tools, reducing hallucinations, etc.) and understand trade-offs.
  • For the future of AI: Prompt engineering opens doors to flexible, powerful systems that can reason, check themselves, and use tools. The paper also highlights challenges such as bias, hallucinations, and understanding how models think. It points to promising directions like combining multiple techniques (hybrid prompts), meta-learning, and more careful, ethical use.

In short, prompt engineering is a practical, powerful way to get more from AI models today—and this paper shows how the field is growing fast, where it’s working well, and where we need to improve next.

Open Problems

We found no open problems mentioned in this paper.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 25 tweets with 356 likes about this paper.