Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

From Regexes to Parsing Expression Grammars (1210.4992v1)

Published 17 Oct 2012 in cs.FL and cs.PL

Abstract: Most scripting languages nowadays use regex pattern-matching libraries. These regex libraries borrow the syntax of regular expressions, but have an informal semantics that is different from the semantics of regular expressions, removing the commutativity of alternation and adding ad-hoc extensions that cannot be expressed by formalisms for efficient recognition of regular languages, such as deterministic finite automata. Parsing Expression Grammars are a formalism that can describe all deterministic context-free languages and has a simple computational model. In this paper, we present a formalization of regexes via transformation to Parsing Expression Grammars. The proposed transformation easily accommodates several of the common regex extensions, giving a formal meaning to them. It also provides a clear computational model that helps to estimate the efficiency of regex-based matchers, and a basis for specifying provably correct optimizations for them.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
Citations (26)

Summary

We haven't generated a summary for this paper yet.