2000 character limit reached
Computing matching statistics on Wheeler DFAs (2301.05338v1)
Published 13 Jan 2023 in cs.DS
Abstract: Matching statistics were introduced to solve the approximate string matching problem, which is a recurrent subroutine in bioinformatics applications. In 2010, Ohlebusch et al. [SPIRE 2010] proposed a time and space efficient algorithm for computing matching statistics which relies on some components of a compressed suffix tree - notably, the longest common prefix (LCP) array. In this paper, we show how their algorithm can be generalized from strings to Wheeler deterministic finite automata. Most importantly, we introduce a notion of LCP array for Wheeler automata, thus establishing a first clear step towards extending (compressed) suffix tree functionalities to labeled graphs.
- Alessio Conte (16 papers)
- Nicola Cotumaccio (15 papers)
- Travis Gagie (123 papers)
- Giovanni Manzini (38 papers)
- Nicola Prezza (59 papers)
- Marinella Sciortino (23 papers)