Faster Approximate String Matching for Short Patterns (0811.3490v2)
Abstract: We study the classical approximate string matching problem, that is, given strings $P$ and $Q$ and an error threshold $k$, find all ending positions of substrings of $Q$ whose edit distance to $P$ is at most $k$. Let $P$ and $Q$ have lengths $m$ and $n$, respectively. On a standard unit-cost word RAM with word size $w \geq \log n$ we present an algorithm using time $$ O(nk \cdot \min(\frac{\log2 m}{\log n},\frac{\log2 m\log w}{w}) + n) $$ When $P$ is short, namely, $m = 2{o(\sqrt{\log n})}$ or $m = 2{o(\sqrt{w/\log w})}$ this improves the previously best known time bounds for the problem. The result is achieved using a novel implementation of the Landau-Vishkin algorithm based on tabulation and word-level parallelism.
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.