Fast Longest Common Extensions in Small Space (1607.06660v1)
Abstract: In this paper we address the longest common extension (LCE) problem: to compute the length $\ell$ of the longest common prefix between any two suffixes of $T\in \Sigman$ with $ \Sigma = {0, \ldots \sigma-1} $. We present two fast and space-efficient solutions based on (Karp-Rabin) \textit{fingerprinting} and \textit{sampling}. Our first data structure exploits properties of Mersenne prime numbers when used as moduli of the Karp-Rabin hash function and takes $n\lceil \log_2\sigma\rceil$ bits of space. Our second structure works with any prime modulus and takes $n\lceil \log_2\sigma\rceil + n/w + w\log_2 n$ bits of space ($ w $ memory-word size). Both structures support $\mathcal O\left(m\log\sigma/w \right)$-time extraction of any length-$m$ text substring, $\mathcal O(\log\ell)$-time LCE queries with high probability, and can be built in optimal $\mathcal O(n)$ time. In the first case, ours is the first result showing that it is possible to answer LCE queries in $o(n)$ time while using only $\mathcal O(1)$ words on top of the space required to store the text. Our results improve the state of the art in space usage, query times, and preprocessing times and are extremely practical: we present a C++ implementation that is very fast and space-efficient in practice.
Sponsored by Paperpile, the PDF & BibTeX manager trusted by top AI labs.
Get 30 days freePaper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.