Improved Grammar-Based Compressed Indexes (1110.4493v1)
Abstract: We introduce the first grammar-compressed representation of a sequence that supports searches in time that depends only logarithmically on the size of the grammar. Given a text $T[1..u]$ that is represented by a (context-free) grammar of $n$ (terminal and nonterminal) symbols and size $N$ (measured as the sum of the lengths of the right hands of the rules), a basic grammar-based representation of $T$ takes $N\lg n$ bits of space. Our representation requires $2N\lg n + N\lg u + \epsilon\, n\lg n + o(N\lg n)$ bits of space, for any $0<\epsilon \le 1$. It can find the positions of the $occ$ occurrences of a pattern of length $m$ in $T$ in $O((m2/\epsilon)\lg (\frac{\lg u}{\lg n}) +occ\lg n)$ time, and extract any substring of length $\ell$ of $T$ in time $O(\ell+h\lg(N/h))$, where $h$ is the height of the grammar tree.
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.