DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines (2310.03714v1)

Published 5 Oct 2023 in cs.CL, cs.AI, cs.IR, and cs.LG

Abstract: The ML community is rapidly exploring techniques for prompting LLMs (LMs) and for stacking them into pipelines that solve complex tasks. Unfortunately, existing LM pipelines are typically implemented using hard-coded "prompt templates", i.e. lengthy strings discovered via trial and error. Toward a more systematic approach for developing and optimizing LM pipelines, we introduce DSPy, a programming model that abstracts LM pipelines as text transformation graphs, i.e. imperative computational graphs where LMs are invoked through declarative modules. DSPy modules are parameterized, meaning they can learn (by creating and collecting demonstrations) how to apply compositions of prompting, finetuning, augmentation, and reasoning techniques. We design a compiler that will optimize any DSPy pipeline to maximize a given metric. We conduct two case studies, showing that succinct DSPy programs can express and optimize sophisticated LM pipelines that reason about math word problems, tackle multi-hop retrieval, answer complex questions, and control agent loops. Within minutes of compiling, a few lines of DSPy allow GPT-3.5 and llama2-13b-chat to self-bootstrap pipelines that outperform standard few-shot prompting (generally by over 25% and 65%, respectively) and pipelines with expert-created demonstrations (by up to 5-46% and 16-40%, respectively). On top of that, DSPy programs compiled to open and relatively small LMs like 770M-parameter T5 and llama2-13b-chat are competitive with approaches that rely on expert-written prompt chains for proprietary GPT-3.5. DSPy is available at https://github.com/stanfordnlp/dspy

Citations (142)

View on Semantic Scholar

Summary

The paper introduces DSPy as a novel programming model that compiles declarative language model calls into self-improving pipelines, enhancing optimization and performance.
The study demonstrates significant accuracy improvements on math word problems and multi-hop question answering, with increases from 65% to 86.7% and 36.9% to 54.7% respectively.
Key innovations include parameterized modules, natural language signatures, and teleprompters that automate prompt optimization, boosting both scalability and efficiency.

DSPy: Compiling Declarative LLM Calls into Self-Improving Pipelines

The paper "DSPy: Compiling Declarative LLM Calls into Self-Improving Pipelines" introduces a novel programming model named DSPy. This model is tailored for constructing and optimizing pipelines of LLMs (LMs) using declarative constructs. The primary objective is to address the limitations of current LM pipelines which extensively rely on hard-coded prompt templates developed through trial and error. The paper introduces DSPy as a more systematic and robust approach for creating and enhancing LM pipelines.

Contributions and Key Concepts

DSPy Programming Model: DSPy abstracts LM pipelines as text transformation graphs where LMs are invoked via declarative modules. These modules avoid the pitfalls of hard-coded prompts, making the system more adaptable and systematic.
Parameterized Declarative Modules: DSPy modules can learn from examples by creating and collecting demonstrations, thus enhancing their performance iteratively through techniques like prompting, fine-tuning, and augmentation.
Compiler for Optimization: A key innovation in DSPy is a compiler that optimizes any given DSPy pipeline to maximize a specified metric. The compiler bootstraps useful LM behaviors and tunes the pipelines automatically, aiming to enhance the quality or reduce the cost of the pipeline operations.

Case Studies and Results

The paper presents compelling case studies demonstrating DSPy’s efficacy:

Math Word Problems (GMS8K):
- Three DSPy programs were evaluated: a simple prediction model (vanilla), a chain-of-thought model (CoT), and a multi-stage reasoning model (ThoughtReflection).
- Strong improvements were observed with DSPy. For instance, the ThoughtReflection model compiled with DSPy showed an accuracy improvement from 65% to 86.7% with GPT-3.5, outperforming standard few-shot prompts significantly.
Complex Question Answering (HotPotQA):
- Evaluated models included vanilla, ReAct, and a BasicMultiHop program.
- The compiled BasicMultiHop program achieved notable metrics: the answer exact match (EM) jumped from 36.9% to 54.7% on the development set when optimized with DSPy.
- The results highlighted DSPy's ability to make even smaller LMs competitive with larger, proprietary models by compiling optimized pipelines.

Technical Innovations

Signatures: Unlike hand-crafted prompts, DSPy signatures are natural language typed declarations that abstract the input/output behavior of a module, allowing versatile adaptation across different tasks.
Modules: Change-of-thought and reaction-based reasoning are embodied as parameterized modules that can emulate complex multi-stage problem-solving techniques. Demonstrations are bootstrapped to replace manual examples.
Teleprompters: Modular strategies that compile DSPy programs by optimizing prompts and fine-tuning strategies. Teleprompters automate the creation of effective few-shot demonstrations, thereby improving modular pipelines through systematic bootstrapping.

Implications and Future Directions

Practical Implications:

Modularity and Scalability: The modular approach ensures that improvements in one part of the LM pipeline can be propagated through the entire system, improving overall scalability.
Efficiency: DSPy’s ability to compile efficient programs not only reduces reliance on proprietary, larger LMs but also makes smaller, open models more effective and suitable for real-world applications.

Theoretical Implications:

Generalization: Parameterized modules and automated bootstrapping facilitate generalization across various tasks and domains, potentially pushing the boundaries of what is achievable with LMs without extensive manual intervention.
Optimization Frameworks: The integration of optimization algorithms (e.g., random search, optuna) within teleprompters showcases a step towards more adaptive and intelligent systems.

Speculation on Future Developments in AI:

Unified AI Pipelines: DSPy hints at the future of AI development where modular, self-improving pipelines become the norm. This could democratize AI development by lowering the barrier for creating high-performance systems.
Adaptive Systems: Future AI systems might leverage frameworks like DSPy to adapt dynamically to new tasks and data, increasing robustness and reducing the need for static model retraining.

Conclusion

DSPy provides a significant step forward in the systematic development and optimization of LM pipelines. By abstracting and compiling declarative modules into highly effective, self-improving systems, DSPy promises to reshape how AI pipelines are constructed and deployed, fostering a more modular, efficient, and scalable approach to leveraging LLMs for complex tasks.