Emergent Mind

Abstract

The \emph{longest common extension} (\emph{LCE}) problem is to preprocess a given string $w$ of length $n$ so that the length of the longest common prefix between suffixes of $w$ that start at any two given positions is answered quickly. In this paper, we present a data structure of $O(z \tau2 + \frac{n}{\tau})$ words of space which answers LCE queries in $O(1)$ time and can be built in $O(n \log \sigma)$ time, where $1 \leq \tau \leq \sqrt{n}$ is a parameter, $z$ is the size of the Lempel-Ziv 77 factorization of $w$ and $\sigma$ is the alphabet size. This is an \emph{encoding} data structure, i.e., it does not access the input string $w$ when answering queries and thus $w$ can be deleted after preprocessing. On top of this main result, we obtain further results using (variants of) our LCE data structure, which include the following: - For highly repetitive strings where the $z\tau2$ term is dominated by $\frac{n}{\tau}$, we obtain a \emph{constant-time and sub-linear space} LCE query data structure. - Even when the input string is not well compressible via Lempel-Ziv 77 factorization, we still can obtain a \emph{constant-time and sub-linear space} LCE data structure for suitable $\tau$ and for $\sigma \leq 2{o(\log n)}$. - The time-space trade-off lower bounds for the LCE problem by Bille et al. [J. Discrete Algorithms, 25:42-50, 2014] and by Kosolobov [CoRR, abs/1611.02891, 2016] can be "surpassed" in some cases with our LCE data structure.

We're not able to analyze this paper right now due to high demand.

Please check back later (sorry!).

Generate a summary of this paper on our Pro plan:

We ran into a problem analyzing this paper.

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.