Emergent Mind

String Indexing for Top-$k$ Close Consecutive Occurrences

(2007.04128)
Published Jul 8, 2020 in cs.DS

Abstract

The classic string indexing problem is to preprocess a string $S$ into a compact data structure that supports efficient subsequent pattern matching queries, that is, given a pattern string $P$, report all occurrences of $P$ within $S$. In this paper, we study a basic and natural extension of string indexing called the string indexing for top-$k$ close consecutive occurrences problem (SITCCO). Here, a consecutive occurrence is a pair $(i,j)$, $i < j$, such that $P$ occurs at positions $i$ and $j$ in $S$ and there is no occurrence of $P$ between $i$ and $j$, and their distance is defined as $j-i$. Given a pattern $P$ and a parameter $k$, the goal is to report the top-$k$ consecutive occurrences of $P$ in $S$ of minimal distance. The challenge is to compactly represent $S$ while supporting queries in time close to the length of $P$ and $k$. We give three time-space trade-offs for the problem. Let $n$ be the length of $S$, $m$ the length of $P$, and $\epsilon\in(0,1]$. Our first result achieves $O(n\log n)$ space and optimal query time of $O(m+k)$. Our second and third results achieve linear space and query times either $O(m+k{1+\epsilon})$ or $O(m + k \log{1+\epsilon} n)$. Along the way, we develop several techniques of independent interest, including a new translation of the problem into a line segment intersection problem and a new recursive clustering technique for trees.

We're not able to analyze this paper right now due to high demand.

Please check back later (sorry!).

Generate a summary of this paper on our Pro plan:

We ran into a problem analyzing this paper.

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.