Co-lexicographically Ordering Automata and Regular Languages -- Part I

Published 9 Aug 2022 in cs.FL and cs.DS | (2208.04931v3)

Abstract: In the present work, we lay out a new theory showing that all automata can always be co-lexicographically partially ordered, and an intrinsic measure of their complexity can be defined and effectively determined, namely, the minimum width $p$ of one of their admissible co-lex partial orders - dubbed here the automaton's co-lex width. We first show that this new measure captures at once the complexity of several seemingly-unrelated hard problems on automata. Any NFA of co-lex width $p$: (i) has an equivalent powerset DFA whose size is exponential in $p$ rather than (as a classic analysis shows) in the NFA's size; (ii) can be encoded using just $\Theta(\log p)$ bits per transition; (iii) admits a linear-space data structure solving regular expression matching queries in time proportional to $p^2$ per matched character. Some consequences of this new parametrization of automata are that PSPACE-hard problems such as NFA equivalence are FPT in $p$, and quadratic lower bounds for the regular expression matching problem do not hold for sufficiently small $p$. We prove that a canonical minimum-width DFA accepting a language $\mathcal L$ - dubbed the Hasse automaton $\mathcal H$ of $\mathcal L$ - can be exhibited. Finally, we explore the relationship between two conflicting objectives: minimizing the width and minimizing the number of states of a DFA. In this context, we provide an analogous of the Myhill-Nerode Theorem for co-lexicographically ordered regular languages.