Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 60 tok/s
Gemini 2.5 Pro 51 tok/s Pro
GPT-5 Medium 18 tok/s Pro
GPT-5 High 14 tok/s Pro
GPT-4o 77 tok/s Pro
Kimi K2 159 tok/s Pro
GPT OSS 120B 456 tok/s Pro
Claude Sonnet 4 38 tok/s Pro
2000 character limit reached

Natural language to SQL in low-code platforms (2308.15239v1)

Published 29 Aug 2023 in cs.AI

Abstract: One of the developers' biggest challenges in low-code platforms is retrieving data from a database using SQL queries. Here, we propose a pipeline allowing developers to write natural language (NL) to retrieve data. In this study, we collect, label, and validate data covering the SQL queries most often performed by OutSystems users. We use that data to train a NL model that generates SQL. Alongside this, we describe the entire pipeline, which comprises a feedback loop that allows us to quickly collect production data and use it to retrain our SQL generation model. Using crowd-sourcing, we collect 26k NL and SQL pairs and obtain an additional 1k pairs from production data. Finally, we develop a UI that allows developers to input a NL query in a prompt and receive a user-friendly representation of the resulting SQL query. We use A/B testing to compare four different models in production and observe a 240% improvement in terms of adoption of the feature, 220% in terms of engagement rate, and a 90% decrease in failure rate when compared against the first model that we put into production, showcasing the effectiveness of our pipeline in continuously improving our feature.

Summary

  • The paper presents a comprehensive NL2SQL pipeline incorporating data collection, model training, and automated deployment to convert natural language queries to SQL in low-code platforms.
  • It utilizes a T5-based generator paired with a CodeBERT ranker and constrained decoding to ensure valid SQL generation and superior performance metrics.
  • Experimental results demonstrate significant improvements with a 240% boost in adoption, 220% increased engagement, and a 90% drop in failure rates compared to baseline models.

NL2SQL in Low-Code Platforms

The paper "Natural language to SQL in low-code platforms" (2308.15239) introduces a comprehensive pipeline for converting NL queries into SQL queries within low-code platforms, addressing the challenges faced by developers with limited SQL expertise. The pipeline encompasses data collection, model training, and deployment, incorporating a feedback loop for continuous improvement. The solution leverages crowdsourcing and production data to train a T5-based model with constrained decoding, enhancing adoption, engagement, and reducing failure rates compared to baseline models.

Methodological Pipeline

The pipeline consists of three primary components: data collection, model training, and model deployment (Figure 1). Figure 1

Figure 1: Machine Learning pipeline that comprises the three main components, data collections (on the left), model training (top right corner), and model deployment (bottom right corner).

Data Collection

Data collection involves three sources: public datasets (Spider), crowdsourced labeled data (MTurk), and production data. Public datasets like Spider are used as a starting point but are augmented with MTurk data to better reflect real-world scenarios. MTurk is employed to label SQL queries with corresponding NL utterances, ensuring a representative set of examples that mimic users' SQL usage distribution. Production data is gathered to analyze model performance and collect SQL queries edited by users, further refining the training data.

Quality Estimation

To ensure the quality of the data, a quality estimation system is implemented. This system filters out incorrect pairs of SQL queries and utterances using both rule-based systems and an ML-based estimator. The ML-based estimator, fine-tuned with CodeBERT, detects whether a SQL query matches an utterance, improving the accuracy of the training data.

SQL Generation

SQL generation is achieved using T5QL, a Seq2Seq method that converts NL queries into SQL queries (Figure 2). T5QL comprises a generator (T5 model) and a ranker (CodeBERT model). The generator is trained to output SQL queries corresponding to the input sequence, while the ranker re-ranks the generator's candidates to improve performance. Constrained decoding, using a CFG of SQL statements, ensures that only valid SQL is generated. Figure 2

Figure 2: T5QL model architecture (from \citet{arcadinho2022t5ql}).

Model Training

The model is retrained iteratively, incorporating new data points from SQL templates where the model performed poorly. Active learning, using clustering on the predicted SQL template, guides the selection of data to improve the model's performance. Model performance is evaluated using exact match, execution match, and TED between the ASTs of gold and predicted SQL queries. Server performance is also monitored to ensure minimal impact on response time.

Deployment

Model deployment is automated using a GitOps-based CI/CD approach. The service is containerized and runs in a Kubernetes infrastructure. Deployment strategies include rolling updates, progressive rollouts, and A/B testing, ensuring minimal downtime and allowing for thorough evaluation of new models in production.

Feedback Loop

A feedback loop is established to continuously improve the model. User feedback, collected through product telemetry data, is used to identify areas for improvement and retrain the model. This loop creates a virtuous cycle, where better models lead to increased usage and more feedback.

Experimental Results

The paper presents both offline and online experiments to evaluate the effectiveness of the proposed pipeline. Offline experiments on an internal test set demonstrated significant improvements using T5QL with constrained decoding and FP16 precision. The maximum input size was increased, and response times were reduced while maintaining a low TED. Online experiments, conducted through A/B testing, showed a 240% increase in adoption, a 220% increase in engagement rate, and a 90% reduction in failure rate compared to the baseline model (RATSQL).

The paper references several key works in the areas of A/B testing and NL2SQL. A/B testing has been widely used for website development and is increasingly being applied to ML model deployment. NL2SQL research has explored various approaches, including enhancing schema representation, generating code from fragments, and using intermediate representations with relation-aware self-attention. Constrained decoding with LLMs has also been shown to improve accuracy by reducing hallucinations.

Conclusion

The work presents a comprehensive NL2SQL pipeline for low-code platforms, emphasizing data-driven model training and a user-centric approach. The pipeline leverages crowdsourcing, production data, and active learning to continuously improve model performance. The experimental results demonstrate significant improvements in adoption, engagement, and failure rates, highlighting the effectiveness of the proposed approach. Future work involves exploring model shadowing to estimate model performance without premature deployment.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.