State Complexity of Pattern Matching in Regular Languages (1806.04645v2)
Abstract: In a simple pattern matching problem one has a pattern $w$ and a text $t$, which are words over a finite alphabet $\Sigma$. One may ask whether $w$ occurs in $t$, and if so, where? More generally, we may have a set $P$ of patterns and a set $T$ of texts, where $P$ and $T$ are regular languages. We are interested whether any word of $T$ begins with a word of $P$, ends with a word of $P$, has a word of $P$ as a factor, or has a word of $P$ as a subsequence. Thus we are interested in the languages $(P\Sigma*)\cap T$, $(\Sigma*P)\cap T$, $(\Sigma* P\Sigma*)\cap T$, and $(\Sigma* \mathbin{\operatorname{shu}} P)\cap T$, where $\operatorname{shu}$ is the shuffle operation. The state complexity $\kappa(L)$ of a regular language $L$ is the number of states in the minimal deterministic finite automaton recognizing $L$. We derive the following upper bounds on the state complexities of our pattern-matching languages, where $\kappa(P)\le m$, and $\kappa(T)\le n$: $\kappa((P\Sigma*)\cap T) \le mn$; $\kappa((\Sigma*P)\cap T) \le 2{m-1}n$; $\kappa((\SigmaP\Sigma^)\cap T) \le (2{m-2}+1)n$; and $\kappa((\Sigma*\mathbin{\operatorname{shu}} P)\cap T) \le (2{m-2}+1)n$. We prove that these bounds are tight, and that to meet them, the alphabet must have at least two letters in the first three cases, and at least $m-1$ letters in the last case. We also consider the special case where $P$ is a single word $w$, and obtain the following tight upper bounds: $\kappa((w\Sigma*)\cap T_n) \le m+n-1$; $\kappa((\Sigma*w)\cap T_n) \le (m-1)n-(m-2)$; $\kappa((\Sigmaw\Sigma^)\cap T_n) \le (m-1)n$; and $\kappa((\Sigma*\mathbin{\operatorname{shu}} w)\cap T_n) \le (m-1)n$. For unary languages, we have a tight upper bound of $m+n-2$ in all eight of the aforementioned cases.
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.