Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
GPT-5.1
GPT-5.1 72 tok/s
Gemini 3.0 Pro 51 tok/s Pro
Gemini 2.5 Flash 147 tok/s Pro
Kimi K2 185 tok/s Pro
Claude Sonnet 4.5 34 tok/s Pro
2000 character limit reached

State Complexity of Pattern Matching in Regular Languages (1806.04645v2)

Published 12 Jun 2018 in cs.FL

Abstract: In a simple pattern matching problem one has a pattern $w$ and a text $t$, which are words over a finite alphabet $\Sigma$. One may ask whether $w$ occurs in $t$, and if so, where? More generally, we may have a set $P$ of patterns and a set $T$ of texts, where $P$ and $T$ are regular languages. We are interested whether any word of $T$ begins with a word of $P$, ends with a word of $P$, has a word of $P$ as a factor, or has a word of $P$ as a subsequence. Thus we are interested in the languages $(P\Sigma*)\cap T$, $(\Sigma*P)\cap T$, $(\Sigma* P\Sigma*)\cap T$, and $(\Sigma* \mathbin{\operatorname{shu}} P)\cap T$, where $\operatorname{shu}$ is the shuffle operation. The state complexity $\kappa(L)$ of a regular language $L$ is the number of states in the minimal deterministic finite automaton recognizing $L$. We derive the following upper bounds on the state complexities of our pattern-matching languages, where $\kappa(P)\le m$, and $\kappa(T)\le n$: $\kappa((P\Sigma*)\cap T) \le mn$; $\kappa((\Sigma*P)\cap T) \le 2{m-1}n$; $\kappa((\SigmaP\Sigma^)\cap T) \le (2{m-2}+1)n$; and $\kappa((\Sigma*\mathbin{\operatorname{shu}} P)\cap T) \le (2{m-2}+1)n$. We prove that these bounds are tight, and that to meet them, the alphabet must have at least two letters in the first three cases, and at least $m-1$ letters in the last case. We also consider the special case where $P$ is a single word $w$, and obtain the following tight upper bounds: $\kappa((w\Sigma*)\cap T_n) \le m+n-1$; $\kappa((\Sigma*w)\cap T_n) \le (m-1)n-(m-2)$; $\kappa((\Sigmaw\Sigma^)\cap T_n) \le (m-1)n$; and $\kappa((\Sigma*\mathbin{\operatorname{shu}} w)\cap T_n) \le (m-1)n$. For unary languages, we have a tight upper bound of $m+n-2$ in all eight of the aforementioned cases.

Citations (2)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.