Emergent Mind

How Can Self-Attention Networks Recognize Dyck-n Languages?

Published Oct 9, 2020 in cs.CL , cs.FL , and cs.LG


We focus on the recognition of Dyck-n ($\mathcal{D}n$) languages with self-attention (SA) networks, which has been deemed to be a difficult task for these networks. We compare the performance of two variants of SA, one with a starting symbol (SA$+$) and one without (SA$-$). Our results show that SA$+$ is able to generalize to longer sequences and deeper dependencies. For $\mathcal{D}2$, we find that SA$-$ completely breaks down on long sequences whereas the accuracy of SA$+$ is 58.82$\%$. We find attention maps learned by $\text{SA}{+}$ to be amenable to interpretation and compatible with a stack-based language recognizer. Surprisingly, the performance of SA networks is at par with LSTMs, which provides evidence on the ability of SA to learn hierarchies without recursion.

