Emergent Mind

Representing Pattern Matching Algorithms by Polynomial-Size Automata

(1607.00138)
Published Jul 1, 2016 in cs.DS and cs.FL

Abstract

Pattern matching algorithms to find exact occurrences of a pattern $S\in\Sigmam$ in a text $T\in\Sigman$ have been analyzed extensively with respect to asymptotic best, worst, and average case runtime. For more detailed analyses, the number of text character accesses $X{\mathcal{A},S}_n$ performed by an algorithm $\mathcal{A}$ when searching a random text of length $n$ for a fixed pattern $S$ has been considered. Constructing a state space and corresponding transition rules (e.g. in a Markov chain) that reflect the behavior of a pattern matching algorithm is a key step in existing analyses of $X{\mathcal{A},S}_n$ in both the asymptotic ($n\to\infty$) and the non-asymptotic regime. The size of this state space is hence a crucial parameter for such analyses. In this paper, we introduce a general methodology to construct corresponding state spaces and demonstrate that it applies to a wide range of algorithms, including Boyer-Moore (BM), Boyer-Moore-Horspool (BMH), Backward Oracle Matching (BOM), and Backward (Non-Deterministic) DAWG Matching (B(N)DM). In all cases except BOM, our method leads to state spaces of size $O(m3)$ for pattern length $m$, a result that has previously only been obtained for BMH. In all other cases, only state spaces with size exponential in $m$ had been reported. Our results immediately imply an algorithm to compute the distribution of $X{\mathcal{A},S}_n$ for fixed $S$, fixed $n$, and $\mathcal{A}\in{\text{BM},\text{BMH},\text{B(N)DM}}$ in polynomial time for a very general class of random text models.

We're not able to analyze this paper right now due to high demand.

Please check back later (sorry!).

Generate a summary of this paper on our Pro plan:

We ran into a problem analyzing this paper.

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.