Emergent Mind

State Complexity of Pattern Matching in Regular Languages

(1806.04645)
Published Jun 12, 2018 in cs.FL

Abstract

In a simple pattern matching problem one has a pattern $w$ and a text $t$, which are words over a finite alphabet $\Sigma$. One may ask whether $w$ occurs in $t$, and if so, where? More generally, we may have a set $P$ of patterns and a set $T$ of texts, where $P$ and $T$ are regular languages. We are interested whether any word of $T$ begins with a word of $P$, ends with a word of $P$, has a word of $P$ as a factor, or has a word of $P$ as a subsequence. Thus we are interested in the languages $(P\Sigma*)\cap T$, $(\Sigma*P)\cap T$, $(\Sigma* P\Sigma*)\cap T$, and $(\Sigma* \mathbin{\operatorname{shu}} P)\cap T$, where $\operatorname{shu}$ is the shuffle operation. The state complexity $\kappa(L)$ of a regular language $L$ is the number of states in the minimal deterministic finite automaton recognizing $L$. We derive the following upper bounds on the state complexities of our pattern-matching languages, where $\kappa(P)\le m$, and $\kappa(T)\le n$: $\kappa((P\Sigma*)\cap T) \le mn$; $\kappa((\Sigma*P)\cap T) \le 2{m-1}n$; $\kappa((\SigmaP\Sigma^)\cap T) \le (2{m-2}+1)n$; and $\kappa((\Sigma*\mathbin{\operatorname{shu}} P)\cap T) \le (2{m-2}+1)n$. We prove that these bounds are tight, and that to meet them, the alphabet must have at least two letters in the first three cases, and at least $m-1$ letters in the last case. We also consider the special case where $P$ is a single word $w$, and obtain the following tight upper bounds: $\kappa((w\Sigma*)\cap Tn) \le m+n-1$; $\kappa((\Sigma*w)\cap Tn) \le (m-1)n-(m-2)$; $\kappa((\Sigmaw\Sigma^)\cap Tn) \le (m-1)n$; and $\kappa((\Sigma*\mathbin{\operatorname{shu}} w)\cap Tn) \le (m-1)n$. For unary languages, we have a tight upper bound of $m+n-2$ in all eight of the aforementioned cases.

We're not able to analyze this paper right now due to high demand.

Please check back later (sorry!).

Generate a summary of this paper on our Pro plan:

We ran into a problem analyzing this paper.

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.