Practical KMP/BM Style Pattern-Matching on Indeterminate Strings (2204.08331v3)

Published 18 Apr 2022 in cs.DS

Abstract: In this paper we describe two simple, fast, space-efficient algorithms for finding all matches of an indeterminate pattern $p = p[1..m]$ in an indeterminate string $x = x[1..n]$, where both $p$ and $x$ are defined on a "small" ordered alphabet $\Sigma$ $-$ say, $\sigma = |\Sigma| \le 9$. Both algorithms depend on a preprocessing phase that replaces $\Sigma$ by an integer alphabet $\Sigma_I$ of size $\sigma_I = \sigma$ which (reversibly, in time linear in string length) maps both $x$ and $p$ into equivalent regular strings $y$ and $q$, respectively, on $\Sigma_I$, whose maximum (indeterminate) letter can be expressed in a 32-bit word (for $\sigma \le 4$, thus for DNA sequences, an 8-bit representation suffices). We first describe an efficient version KMP Indet of the venerable Knuth-Morris-Pratt algorithm to find all occurrences of $q$ in $y$ (that is, of $p$ in $x$), but, whenever necessary, using the prefix array, rather than the border array, to control shifts of the transformed pattern $q$ along the transformed string $y$. We go on to describe a similar efficient version BM Indet of the Boyer- Moore algorithm that turns out to execute significantly faster than KMP Indet over a wide range of test cases. A noteworthy feature is that both algorithms require very little additional space: $\Theta(m)$ words. We conjecture that a similar approach may yield practical and efficient indeterminate equivalents to other well-known pattern-matching algorithms, in particular the several variants of Boyer-Moore.

Citations (3)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Related Papers

On the Practical Power of Automata in Pattern Matching (2022)
PalFM-index: FM-index for Palindrome Pattern Matching (2022)
The Parameterized Suffix Tray (2020)
Representing Pattern Matching Algorithms by Polynomial-Size Automata (2016)
Fast Searching in Packed Strings (2009)