Formal Language Recognition by Hard Attention Transformers: Perspectives from Circuit Complexity (2204.06618v1)

Published 13 Apr 2022 in cs.CC, cs.AI, cs.CL, cs.FL, and cs.LG

Abstract: This paper analyzes three formal models of Transformer encoders that differ in the form of their self-attention mechanism: unique hard attention (UHAT); generalized unique hard attention (GUHAT), which generalizes UHAT; and averaging hard attention (AHAT). We show that UHAT and GUHAT Transformers, viewed as string acceptors, can only recognize formal languages in the complexity class AC$^0$, the class of languages recognizable by families of Boolean circuits of constant depth and polynomial size. This upper bound subsumes Hahn's (2020) results that GUHAT cannot recognize the DYCK languages or the PARITY language, since those languages are outside AC$^0$ (Furst et al., 1984). In contrast, the non-AC$^0$ languages MAJORITY and DYCK-1 are recognizable by AHAT networks, implying that AHAT can recognize languages that UHAT and GUHAT cannot.

Citations (56)

View on Semantic Scholar

Summary

The paper shows that UHAT and GUHAT are limited to recognizing languages within the AC⁰ complexity class.
It demonstrates that AHAT can capture non-AC⁰ languages such as MAJORITY and DYCK-1, indicating superior capacity.
The study bridges Transformer neural architectures with circuit complexity theory, informing practical NLP applications and future research.

Analyzing Transformer Language Recognition via Circuit Complexity

In the paper Formal Language Recognition by Hard Attention Transformers: Perspectives from Circuit Complexity, the authors Yiding Hao, Dana Angluin, and Robert Frank explore the theoretical limitations and capabilities of different models of Transformer self-attention in the recognition of formal languages. This paper offers an intricate investigation into three models of hard attention Transformers: unique hard attention (UHAT), generalized unique hard attention (GUHAT), and averaging hard attention (AHAT). Through a rigorous examination, the paper establishes significant results concerning the expressive power of these Transformer variants in the context of circuit complexity theory.

Summary of Contributions

The primary contributions of the paper are the formal definitions and analyses of the capacities of GUHAT, UHAT, and AHAT models with respect to recognizing formal languages within the AC $^0$ complexity class. The authors show that UHAT and GUHAT Transformers are confined to recognizing only those languages that can be characterized by families of Boolean circuits of constant depth and polynomial size—specifically, the AC $^0$ class. This result effectively subsumes earlier work by Hahn (2020) showing that GUHAT cannot recognize languages like DYCK and PARITY, as these languages exist beyond AC $^0$ . In contrast, AHAT is found capable of recognizing some non-AC $^0$ languages, such as MAJORITY and DYCK-1, showcasing its superior language recognition potential over the other two models.

Implications and Theoretical Insights

The formal constructions and definitions provided for the UHAT, GUHAT, and AHAT models offer a foundational understanding of Transformer capabilities vis-à-vis classical computational complexity. By situating these models within the constraints of circuit complexity, this work reveals the limitations of hard attention models, akin to practical restrictions found in neural networks, while delineating the boundaries of their expressive power.

The implications for practical applications within NLP are notable. Given the bounded capacity of GUHAT and UHAT models, it becomes clear that the complexity of languages these models can effectively handle is inherently limited. This brings attention to the need for employing AHAT or other mechanisms for adequately tackling linguistic structures naturally beyond the scope of AC $^0$ .

Moreover, the paper underscores the relevance of examining neural network models through the lens of classical computational theories, providing a bridge between the expressive theories of formal languages and practical machine learning architectures.

Future Directions

Based on the insights gained, several intriguing future research directions arise. Exploring the closure properties of the language classes recognized by GUHAT, UHAT, and AHAT could yield deeper understanding of these models. Another exciting prospect involves further investigation into the impact of different positional encoding schemes and their influence on the recognitional power of Transformers.

Additionally, the potential extension of this framework to include soft attention models could help delineate their comparative expressivity, especially given existing constructions enabling soft attention to recognize languages such as DYCK-k. As understanding develops, this line of research promises to enhance both theoretical comprehension and practical implementation of NLP systems using Transformer architectures.

Overall, this work signifies a meaningful contribution to the theoretical landscape, setting the stage for more nuanced models and solutions in formal language processing with neural networks. With continued efforts, this approach could notoriously refine our understanding of computation and language recognition in artificial intelligence.