Simulation of Graph Algorithms with Looped Transformers (2402.01107v3)
Abstract: The execution of graph algorithms using neural networks has recently attracted significant interest due to promising empirical progress. This motivates further understanding of how neural networks can replicate reasoning steps with relational data. In this work, we study the ability of transformer networks to simulate algorithms on graphs from a theoretical perspective. The architecture we use is a looped transformer with extra attention heads that interact with the graph. We prove by construction that this architecture can simulate individual algorithms such as Dijkstra's shortest path, Breadth- and Depth-First Search, and Kosaraju's strongly connected components, as well as multiple algorithms simultaneously. The number of parameters in the networks does not increase with the input graph size, which implies that the networks can simulate the above algorithms for any graph. Despite this property, we show a limit to simulation in our solution due to finite precision. Finally, we show a Turing Completeness result with constant width when the extra attention heads are utilized.
- The Design and Analysis of Computer Algorithms. Addison-Wesley, Reading, Mass., 1974.
- What learning algorithm is in-context learning? investigations with linear models. arXiv preprint arXiv:2211.15661, 2022.
- Neural algorithmic reasoning with causal regularisation. In Proceedings of the 40th International Conference on Machine Learning, 2023.
- Combinatorial optimization and reasoning with graph neural networks. Journal of Machine Learning Research, 24(130):1–61, 2023.
- Introduction to algorithms. MIT press, 2022.
- Scaling vision transformers to 22 billion parameters. In International Conference on Machine Learning, pages 7480–7512, 2023.
- Relational attention: Generalizing transformers for graph-structured tasks. In The Eleventh International Conference on Learning Representations, 2023.
- Edsger W Dijkstra. A note on two problems in connexion with graphs. Numer. Math., 1959. doi: 10.1007/BF01386390.
- Graph neural networks are dynamic programmers. Advances in Neural Information Processing Systems, 35:20635–20647, 2022.
- Parallel algorithms align with neural execution. In The Second Learning on Graphs Conference, 2023.
- Learning transformer programs. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
- Beyond erdos-renyi: Generalization in algorithmic reasoning on graphs. In NeurIPS 2023 Workshop: I Can’t Believe It’s Not Better Workshop: Failure Modes in the Age of Foundation Models, 2023a.
- Neural algorithmic reasoning for combinatorial optimisation. In The Second Learning on Graphs Conference, 2023b.
- Looped transformers as programmable computers. In Proceedings of the 40th International Conference on Machine Learning, volume 202, pages 11398–11442, 2023.
- Neural turing machines. arXiv preprint arXiv:1410.5401, 2014.
- Hybrid computing using a neural network with dynamic external memory. Nature, 538(7626):471–476, 2016.
- A generalist neural algorithmic learner. In Learning on Graphs Conference, pages 2–1, 2022.
- Neural gpus learn algorithms. In International Conference on Learning Representations, 2016.
- Transformers in vision: A survey. ACM computing surveys (CSUR), 54(10s):1–41, 2022.
- Semi-supervised classification with graph convolutional networks. In International Conference on Learning Representations, 2017.
- Teaching arithmetic to small transformers. In The Twelfth International Conference on Learning Representations, 2024.
- Tracr: Compiled transformers as a laboratory for interpretability. arXiv preprint arXiv:2301.05062, 2023.
- Transformers learn shortcuts to automata. In The Eleventh International Conference on Learning Representations, 2023.
- Andreas Loukas. What graph neural networks cannot learn: depth vs width. In International Conference on Learning Representations, 2020.
- Urisc: the ultimate reduced instruction set computer. International Journal of Electrical Engineering Education, 25(4):327–334, 1988.
- Edward F. Moore. The Shortest Path Through a Maze. Bell Telephone System. Technical publications. monograph. Bell Telephone System., 1959.
- Dual algorithmic reasoning. In The Eleventh International Conference on Learning Representations, 2023.
- Attention is turing-complete. Journal of Machine Learning Research, 22(75):1–35, 2021.
- Neural programmer-interpreters. In The Fourth International Conference on Learning Representations, 2016.
- Neural algorithmic reasoning without intermediate supervision. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
- On the computational power of neural nets. Journal of Computer and System Sciences, 50:132–150, 1995.
- Towards scale-invariant graph-related problem solving by iterative homogeneous GNNs. Advances in Neural Information Processing Systems, 33:15811–15822, 2020.
- Sparse sinkhorn attention. In International Conference on Machine Learning, pages 9438–9447, 2020.
- Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- Neural algorithmic reasoning. Patterns, 2(7), 2021.
- The CLRS algorithmic reasoning benchmark. In International Conference on Machine Learning, pages 22084–22102, 2022.
- Graph attention networks. In International Conference on Learning Representations, 2018.
- Neural execution of graph algorithms. In International Conference on Learning Representations, 2020.
- Statistically meaningful approximation: a case study on approximating turing machines with transformers. Advances in Neural Information Processing Systems, 35:12071–12083, 2022a.
- Emergent abilities of large language models. Transactions on Machine Learning Research, 2022b. ISSN 2835-8856.
- What can neural networks reason about? In International Conference on Learning Representations, 2020.
- Neural execution engines: Learning to execute subroutines. Advances in Neural Information Processing Systems, 33:17298–17308, 2020.
- Incorporating convolution designs into visual transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 579–588, 2021.