Emergent Mind

A Fast Heuristic for Exact String Matching

(1512.03512)
Published Dec 11, 2015 in cs.DS

Abstract

Given a pattern string $P$ of length $n$ consisting of $\delta$ distinct characters and a query string $T$ of length $m$, where the characters of $P$ and $T$ are drawn from an alphabet $\Sigma$ of size $\Delta$, the {\em exact string matching} problem consists of finding all occurrences of $P$ in $T$. For this problem, we present a randomized heuristic that in $O(n\delta)$ time preprocesses $P$ to identify $sparse(P)$, a rarely occurring substring of $P$, and then use it to find all occurrences of $P$ in $T$ efficiently. This heuristic has an expected search time of $O( \frac{m}{min(|sparse(P)|, \Delta)})$, where $|sparse(P)|$ is at least $\delta$. We also show that for a pattern string $P$ whose characters are chosen uniformly at random from an alphabet of size $\Delta$, $E[|sparse(P)|]$ is $\Omega(\Delta log (\frac{2\Delta}{2\Delta-\delta}))$.

We're not able to analyze this paper right now due to high demand.

Please check back later (sorry!).

Generate a summary of this paper on our Pro plan:

We ran into a problem analyzing this paper.

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.