Technical Report: Match-reference regular expressions and lenses (2212.04439v1)

Published 8 Dec 2022 in cs.PL

Abstract: A lens is a single program that specifies two data transformations at once: one transformation converts data from source format to target format and a second transformation inverts the process. Over the past decade, researchers have developed many different kinds of lenses with different properties. One class of such languages operate over regular languages. In other words, these lenses convert strings drawn from one regular language to strings drawn from another regular language (and back again). In this paper, we define a more powerful language of lenses, which we call match-reference lenses, that is capable of translating between non-regular formats that contain repeated substrings, which is a primitive form of dependency. To define the non-regular formats themselves, we develop a new language, match-reference regular expressions, which are regular expressions that can bind variables to substrings and use those substrings repeatedly. These match-reference regular expressions are closely related to the familiar ``back-references" that can be found in traditional regular expression packages, but are redesigned to adhere to conventional programming language lexical scoping conventions and to interact smoothly with lens language infrastructure. We define the semantics of match-reference regular expressions and match-reference lenses. We also define a new kind of automaton, the match-reference regex automaton system (MRRAS), for deciding string membership in the language match-reference regular expressions. We illustrate our definitions with a variety of examples.