Efficient Error-tolerant Search on Knowledge Graphs

Published 10 Sep 2016 in cs.DB | (1609.03095v4)

Abstract: Edge-labeled graphs are widely used to describe relationships between entities in a database. Given a query subgraph that represents an example of what the user is searching for, we study the problem of efficiently searching for similar subgraphs in a large data graph, where the similarity is defined in terms of the well-known graph edit distance. We call these queries "error-tolerant exemplar queries" since matches are allowed despite small variations in the graph structure and the labels. The problem in its general case is computationally intractable, but efficient solutions are reachable for labeled graphs under well-behaved distribution of the labels, commonly found in knowledge graphs. We propose two efficient exact algorithms, based on a filtering-and-verification framework, for finding subgraphs in a large data graph that are isomorphic to a query graph under some edit operations. Our filtering scheme, which uses the neighbourhood structure around a node and the presence or absence of paths, significantly reduces the number of candidates that are passed to the verification stage. Moreover, we analyze the costs of our algorithms and the conditions under which one algorithm is expected to outperform the other. Our analysis identifies some of the variables that affect the cost, including the number and the selectivity of query edge labels and the degree of nodes in the data graph, and characterizes their relationships. We empirically evaluate the effectiveness of our filtering schemes and queries, the efficiency of our algorithms and the reliability of our cost models on real datasets.