Emergent Mind

Efficiently Enumerating Hitting Sets of Hypergraphs Arising in Data Profiling

(1805.01310)
Published May 3, 2018 in cs.DS and cs.CC

Abstract

The transversal hypergraph problem is the task of enumerating the minimal hitting sets of a hypergraph. It is a long-standing open question whether this can be done in output-polynomial time. For hypergraphs whose solutions have bounded size, Eiter and Gottlob [SICOMP 1995] gave an algorithm that runs in output-polynomial time, but whose space requirement also scales with the output size. We improve this to polynomial delay and polynomial space. More generally, we present an algorithm that on $n$-vertex, $m$-edge hypergraphs has delay $O(m{k*+1} n2)$ and uses $O(mn)$ space, where $k*$ is the maximum size of any minimal hitting set. Our algorithm is oblivious to $k*$, a quantity that is hard to compute or even approximate. Central to our approach is the extension problem for minimal hitting sets, deciding for a set $X$ of vertices whether it is contained in any solution. With $|X|$ as parameter, we show that this is one of the first natural problems to be complete for the complexity class $W[3]$. We give an algorithm for the extension problem running in time $O(m{|X|+1} n)$. We also prove a conditional lower bound under the Strong Exponential Time Hypothesis, showing that this is close to optimal. We apply our enumeration method to the discovery problem of minimal unique column combinations from data profiling. Our empirical evaluation suggests that the algorithm outperforms its worst-case guarantees on hypergraphs stemming from real-world databases.

We're not able to analyze this paper right now due to high demand.

Please check back later (sorry!).

Generate a summary of this paper on our Pro plan:

We ran into a problem analyzing this paper.

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.