Query Rewriting via Large Language Models (2403.09060v1)

Published 14 Mar 2024 in cs.DB

Abstract: Query rewriting is one of the most effective techniques for coping with poorly written queries before passing them down to the query optimizer. Manual rewriting is not scalable, as it is error-prone and requires deep expertise. Similarly, traditional query rewriting algorithms can only handle a small subset of queries: rule-based techniques do not generalize to new query patterns and synthesis-based techniques cannot handle complex queries. Fortunately, the rise of LLMs, equipped with broad general knowledge and advanced reasoning capabilities, has created hopes for solving some of these previously open problems. In this paper, we present GenRewrite, the first holistic system that leverages LLMs for query rewriting. We introduce the notion of Natural Language Rewrite Rules (NLR2s), and use them as hints to the LLM but also a means for transferring knowledge from rewriting one query to another, and thus becoming smarter and more effective over time. We present a novel counterexample-guided technique that iteratively corrects the syntactic and semantic errors in the rewritten query, significantly reducing the LLM costs and the manual effort required for verification. GenRewrite speeds up 22 out of 99 TPC queries (the most complex public benchmark) by more than 2x, which is 2.5x--3.2x higher coverage than state-of-the-art traditional query rewriting and 2.1x higher than the out-of-the-box LLM baseline.

References (45)

Citations (5)

View on Semantic Scholar

Summary

The paper introduces GenRewrite, a system that leverages LLMs with counterexample-guided correction for SQL query rewriting.
It employs Natural Language Rewrite Rules (NLR2s) to transfer rewrite knowledge between queries without relying on query-specific data.
Evaluation on the TPC-DS benchmark shows GenRewrite optimizes 33.3% of queries, outperforming state-of-the-art methods with significant speedups.

Query Rewriting via LLMs

Introduction

The paper "Query Rewriting via LLMs" presents GenRewrite, a novel system leveraging LLMs for rewriting SQL queries to optimize performance. This research addresses significant limitations of traditional query rewriting systems and explores the capabilities of LLMs to enhance query performance autonomously.

Challenges in Traditional Query Rewriting

Traditional query rewriting methods rely heavily on rule-based systems that struggle with novel query patterns and complex queries. These systems require predefined rewrite rules, leading to difficulties in handling new or unforeseen query patterns. Synthesis-based approaches, although not rule-bound, often falter when navigating extensive query landscapes, especially in complex benchmarks like TPC-DS.

GenRewrite: Leveraging LLMs for Query Optimization

GenRewrite introduces several innovative techniques to overcome these challenges:

Natural Language Rewrite Rules (NLR2s): NLR2s are textual explanations summarizing query rewrites. They transfer knowledge gained from rewriting one query to another, effectively guiding the LLM and providing understanding to users. This system avoids query-specific data in NLR2s, enhancing the generality of the knowledge transfer.
Counterexample-Guided Correction: This technique iteratively refines potential rewrites through semantic and syntactic corrections. The LLM addresses semantic mismatches, while database feedback ensures syntactic correctness. This dual-phase correction process ensures that candidate queries align with the intended semantics and are executable.
Figure 1: High-level Workflow of GenRewrite showcasing iterative processes.

Evaluation and Performance

The empirical evaluation of GenRewrite on the TPC-DS benchmark reveals its superior performance in query rewriting. GenRewrite notably optimizes 33.3% of queries beyond a 10% speedup, surpassing state-of-the-art methods like LearnedRewrite (LR) and Fusion by significant margins.

Figure 3: Comparison of speedup achieved by different approaches on queries with at least 2x speedup.

Contributions and Implications

First Comprehensive LLM Query Rewriting Analysis: This work offers an in-depth analysis of the challenges and potentials of LLMs for query rewriting, setting a foundation for future research.
Introduction of NLR2s: GenRewrite enhances the LLM's capability by employing NLR2s, which help in achieving effective rewrites and knowledge transfer.
Iterative Correction Approach: The introduction of a counterexample-guided correction method significantly improves rewrite accuracy, making the system robust against errors typical in LLM responses.

Conclusion

GenRewrite demonstrates a pioneering approach in automating query rewriting by harnessing the power of LLMs. Its ability to adaptively transfer knowledge and refine rewrites iteratively makes it a promising solution for complex query optimization. The insights and methodologies outlined in this paper may drive further developments in using LLMs for other database-related tasks, emphasizing the versatile application of advanced AI models in the field of database management and optimization.

The research opens pathways for future enhancements in AI-driven query optimization, potentially leading to more intelligent and adaptive database management systems in the future.