Papers
Topics
Authors
Recent
2000 character limit reached

Increasing the LLM Accuracy for Question Answering: Ontologies to the Rescue! (2405.11706v1)

Published 20 May 2024 in cs.AI, cs.DB, cs.IR, and cs.LO

Abstract: There is increasing evidence that question-answering (QA) systems with LLMs, which employ a knowledge graph/semantic representation of an enterprise SQL database (i.e. Text-to-SPARQL), achieve higher accuracy compared to systems that answer questions directly on SQL databases (i.e. Text-to-SQL). Our previous benchmark research showed that by using a knowledge graph, the accuracy improved from 16% to 54%. The question remains: how can we further improve the accuracy and reduce the error rate? Building on the observations of our previous research where the inaccurate LLM-generated SPARQL queries followed incorrect paths, we present an approach that consists of 1) Ontology-based Query Check (OBQC): detects errors by leveraging the ontology of the knowledge graph to check if the LLM-generated SPARQL query matches the semantic of ontology and 2) LLM Repair: use the error explanations with an LLM to repair the SPARQL query. Using the chat with the data benchmark, our primary finding is that our approach increases the overall accuracy to 72% including an additional 8% of "I don't know" unknown results. Thus, the overall error rate is 20%. These results provide further evidence that investing knowledge graphs, namely the ontology, provides higher accuracy for LLM powered question answering systems.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (18)
  1. Repair checking in inconsistent databases: algorithms and complexity. In Proceedings of the 12th International Conference on Database Theory (New York, NY, USA, 2009), ICDT ’09, Association for Computing Machinery, p. 31–41.
  2. Semantic Web for the Working Ontologist: Effective Modeling for Linked Data, RDFS, and OWL, 3 ed., vol. 33. Association for Computing Machinery, New York, NY, USA, 2020.
  3. Repairagent: An autonomous, llm-based agent for program repair, 2024.
  4. Expanding the scope of the ATIS task: The ATIS-3 corpus. Proceedings of the workshop on Human Language Technology (1994), 43–48.
  5. Automated program repair. Commun. ACM 62, 12 (nov 2019), 56–65.
  6. Baseball: An automatic question-answerer. In Papers Presented at the May 9-11, 1961, Western Joint IRE-AIEE-ACM Computer Conference (New York, NY, USA, 1961), IRE-AIEE-ACM ’61 (Western), Association for Computing Machinery, p. 219–224.
  7. The use of theorem-proving techniques in question-answering systems. In Proceedings of the 1968 23rd ACM National Conference (New York, NY, USA, 1968), ACM ’68, Association for Computing Machinery, p. 169–181.
  8. Developing a natural language interface to complex data. ACM Trans. Database Syst. 3, 2 (jun 1978), 105–147.
  9. Knowledge Graphs. Synthesis Lectures on Data, Semantics, and Knowledge. Morgan & Claypool Publishers, 2021.
  10. Knowledge graphs. ACM Comput. Surv. 54, 4 (2022), 71:1–71:37.
  11. Inferfix: End-to-end program repair with llms. In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (New York, NY, USA, 2023), ESEC/FSE 2023, Association for Computing Machinery, p. 1646–1656.
  12. A benchmark to understand the role of knowledge graphs on large language model’s accuracy for question answering on enterprise sql databases, 2023.
  13. Automated construction of database interfaces: Intergrating statistical and relational learning for semantic parsing. In 2000 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (2000), pp. 133–141.
  14. Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-sql task. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (2018), pp. 3911–3921.
  15. Woods, W. A. Transition network grammars for natural language analysis. Commun. ACM 13, 10 (oct 1970), 591–606.
  16. Corrective retrieval augmented generation, 2024.
  17. Learning to parse database queries using inductive logic programming. In Proceedings of the Thirteenth National Conference on Artificial Intelligence - Volume 2 (1996), pp. 1050–1055.
  18. A survey of learning-based automated program repair. ACM Trans. Softw. Eng. Methodol. 33, 2 (dec 2023).
Citations (10)

Summary

  • The paper introduces two key components, OBQC and LLM Repair, to enhance the accuracy of LLM-based QA systems through ontological validation.
  • It leverages domain and range rules to correct SPARQL queries, achieving an overall accuracy of 72% and reducing error rates to 20%.
  • The method is tested under various question and schema complexities, demonstrating robust performance improvements even in high-complexity scenarios.

Increasing the Accuracy of LLM Question-Answering Systems with Ontologies

The paper "Increasing the LLM Accuracy for Question Answering: Ontologies to the Rescue!" explores mechanisms to enhance the performance of LLM-based question-answering systems by incorporating ontologies into the query generation and correction process. This paper outlines two main components: Ontology-Based Query Check (OBQC) and LLM Repair, which work synergistically to reduce error rates and increase accuracy in complex querying scenarios.

Introduction

Question answering (QA) systems powered by LLMs have become increasingly popular in enterprise environments, especially for querying information from SQL databases. However, the raw Text-to-SQL approach often suffers from lower accuracy rates. Previous benchmark research revealed that utilizing a semantic representation of enterprise data through knowledge graphs significantly improved accuracy rates—from 16% to 54%. This research proposes additional improvements by utilizing the semantic capabilities of ontologies to detect and correct errors in LLM-generated SPARQL queries. The primary goal is to deliver more reliable answers by enforcing semantic constraints during query formulation and iteration.

Methodology

Ontology-Based Query Check (OBQC)

OBQC leverages the semantic definitions present in ontologies to verify the validity of SPARQL queries generated by LLMs. It comprises rule-based evaluations on the query components:

  • Domain and Range Rules: These rules validate that the subjects and objects of triples used in queries comply with semantic definitions such as rdfs:domain and rdfs:range. Violations indicate mismatches in expected data types or structural logic.
  • Double Domain and Range Rules: These rules check for conflicts when multiple properties apply conditions on shared subjects or objects. If two properties prescribe conflicting definitions, corrections are initiated based on established ontological hierarchies.
  • Incorrect Property Checks: This verification ensures properties used in queries exist in the defined ontology, dismissing non-standard properties unless specifically defined.
  • IRI vs. Human-readable Checks in SELECT Clauses: These checks ensure variables intended for display are not raw IRIs, which lack meaningful context for business users. Figure 1

    Figure 1: Overview of our Ontology-based Query Checker and LLM Repair approach.

LLM Repair

Upon identification of incorrect queries by OBQC, explanations are generated on why the current SPARQL query doesn't align with the ontology. This explanation, alongside the incorrect query, is used to prompt the LLM to generate a corrected version. This iterative process continues until a valid query is formed or a predefined attempt limit is reached, beyond which an "unknown" state is acknowledged.

Experimental Setup

Using the context from previous works, the evaluation employs benchmarks focusing on variations in question and schema complexity: Low/High Question Complexity crossed with Low/High Schema Complexity. GPT-4 processes these with zero-shot prompting, generating SPARQL queries which the OBQC checks. If errors are found, the query is repaired iteratively as needed.

Results

The approach significantly improved the overall execution accuracy to 72%, with an error rate reduction to 20%: Figure 2

Figure 2: Average Overall Execution Accuracy (AOEA) of SPARQL and SQL for all the questions in the benchmark.

  • Low Question/Low Schema Complexity: The most pronounced improvement, reducing error rates to 10.46%.
  • High Complexity Scenarios: Substantial accuracy increases reflecting the robustness of ontology-based checks in handling intricate schema dependencies. Figure 3

    Figure 3: Average Overall Execution Accuracy (AOEA) of SPARQL and SQL for all questions in each quadrant.

The rule utility analysis revealed that most corrections involved domain rules, highlighting common LLM errors at triple beginnings versus more sporadic issues at ranges.

Conclusion

Integrating ontological checks into LLM QA systems delivers noticeable accuracy improvements. These systems can reliably decrease error rates by structurally enforcing semantic correctness in query generation. Future research may enhance the expressiveness of ontologies or extend rule applications to cater to logic-based OWL extensions. Overall, the ontology-augmented methods advance the goal of creating more trustworthy AI-driven data retrieval systems in organizational contexts, promising better adoption rates through enhanced reliability.

Whiteboard

Paper to Video (Beta)

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 14 tweets with 143 likes about this paper.