Transformer-Based Models Are Not Yet Perfect At Learning to Emulate Structural Recursion (2401.12947v1)
Abstract: This paper investigates the ability of transformer-based models to learn structural recursion from examples. Recursion is a universal concept in both natural and formal languages. Structural recursion is central to the programming language and formal mathematics tasks where symbolic tools currently excel beyond neural models, such as inferring semantic relations between datatypes and emulating program behavior. We introduce a general framework that nicely connects the abstract concepts of structural recursion in the programming language domain to concrete sequence modeling problems and learned models' behavior. The framework includes a representation that captures the general \textit{syntax} of structural recursion, coupled with two different frameworks for understanding their \textit{semantics} -- one that is more natural from a programming languages perspective and one that helps bridge that perspective with a mechanistic understanding of the underlying transformer architecture. With our framework as a powerful conceptual tool, we identify different issues under various set-ups. The models trained to emulate recursive computations cannot fully capture the recursion yet instead fit short-cut algorithms and thus cannot solve certain edge cases that are under-represented in the training distribution. In addition, it is difficult for state-of-the-art LLMs to mine recursive rules from in-context demonstrations. Meanwhile, these LLMs fail in interesting ways when emulating reduction (step-wise computation) of the recursive function.
- Proofster: Automated formal verification. In ICSE Demo, May 2023.
- Unified pre-training for program understanding and generation. arXiv preprint arXiv:2103.06333, 2021.
- Santacoder: don’t reach for the stars! arXiv preprint arXiv:2301.03988, 2023.
- Yonatan Belinkov. Probing classifiers: Promises, shortcomings, and advances. Computational Linguistics, 48(1):207–219, March 2022. doi: 10.1162/coli_a_00422. URL https://aclanthology.org/2022.cl-1.7.
- Interactive theorem proving and program development: Coq’Art: the calculus of inductive constructions. Springer Science & Business Media, 2013.
- On the Ability and Limitations of Transformers to Recognize Formal Languages. In Bonnie Webber, Trevor Cohn, Yulan He, and Yang Liu (eds.), Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 7096–7116, Online, November 2020a. Association for Computational Linguistics. doi: 10.18653/v1/2020.emnlp-main.576. URL https://aclanthology.org/2020.emnlp-main.576.
- On the computational power of transformers and its implications in sequence modeling. In Raquel Fernández and Tal Linzen (eds.), Proceedings of the 24th Conference on Computational Natural Language Learning, pp. 455–475, Online, November 2020b. Association for Computational Linguistics. doi: 10.18653/v1/2020.conll-1.37. URL https://aclanthology.org/2020.conll-1.37.
- Egon Börger. Abstract state machines: a unifying view of models of computation and of system design frameworks. Ann. Pure Appl. Log., 133(1-3):149–171, 2005. doi: 10.1016/j.apal.2004.10.007. URL https://doi.org/10.1016/j.apal.2004.10.007.
- A behavioural theory of recursive algorithms. Sci. Comput. Program., 210:102691, 2020. URL https://api.semanticscholar.org/CorpusID:210023368.
- On understanding types, data abstraction, and polymorphism. ACM Comput. Surv., 17(4):471–523, dec 1985. ISSN 0360-0300. doi: 10.1145/6041.6042. URL https://doi.org/10.1145/6041.6042.
- Mike Casey. The Dynamics of Discrete-Time Computation, with Application to Recurrent Neural Networks and Finite State Machine Extraction. Neural Computation, 8(6):1135–1178, 08 1996. ISSN 0899-7667. doi: 10.1162/neco.1996.8.6.1135. URL https://doi.org/10.1162/neco.1996.8.6.1135.
- Natgen: generative pre-training by “naturalizing” source code. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 18–30, 2022.
- Neurosymbolic programming. Foundations and Trends® in Programming Languages, 7(3):158–243, 2021.
- Evaluating large language models trained on code. CoRR, abs/2107.03374, 2021. URL https://arxiv.org/abs/2107.03374.
- Adam Chlipala. Certified Programming with Dependent Types: A Pragmatic Introduction to the Coq Proof Assistant. The MIT Press, 2013. ISBN 0262026651.
- Palm: Scaling language modeling with pathways. CoRR, abs/2204.02311, 2022. doi: 10.48550/arXiv.2204.02311. URL https://doi.org/10.48550/arXiv.2204.02311.
- A toy model of universality: Reverse engineering how networks learn group operations, 2023. URL https://arxiv.org/abs/2302.03025.
- Towards automated circuit discovery for mechanistic interpretability, 2023.
- Inductively defined types. In Per Martin-Löf and Grigori Mints (eds.), COLOG-88, pp. 50–66, Berlin, Heidelberg, 1990. Springer Berlin Heidelberg. ISBN 978-3-540-46963-6.
- Inductive logic programming at 30. CoRR, abs/2102.10556, 2021. URL https://arxiv.org/abs/2102.10556.
- Universal transformers, 2019.
- Neural networks and the chomsky hierarchy, 2022. URL https://arxiv.org/abs/2207.02098.
- Traced: Execution-aware pre-training for source code. arXiv preprint arXiv:2306.07487, 2023.
- How can self-attention networks recognize Dyck-n languages? In Trevor Cohn, Yulan He, and Yang Liu (eds.), Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 4301–4306, Online, November 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.findings-emnlp.384. URL https://aclanthology.org/2020.findings-emnlp.384.
- Diversity-driven automated formal verification. In ICSE, pp. 749–761, May 2022. doi: 10.1145/3510003.3510138.
- Baldur: Whole-proof generation and repair with large language models, 2023.
- Incoder: A generative model for code infilling and synthesis. arXiv preprint arXiv:2204.05999, 2022.
- Transformer feed-forward layers are key-value memories, 2020. URL https://arxiv.org/abs/2012.14913.
- Cruxeval: A benchmark for code reasoning, understanding and execution. arXiv preprint arXiv:2401.03065, 2024.
- Program synthesis. Foundations and Trends® in Programming Languages, 4(1-2):1–119, 2017. ISSN 2325-1107. doi: 10.1561/2500000010. URL http://dx.doi.org/10.1561/2500000010.
- DeepFix: Fixing common C language errors by deep learning. In AAAI, 2017.
- Yuri Gurevich. Evolving Algebras 1993: Lipari Guide, pp. 9–36. Oxford University Press, specification and validation methods edition, January 1995. URL https://www.microsoft.com/en-us/research/publication/103-evolving-algebras-1993-lipari-guide/.
- Michael Hahn. Theoretical limitations of self-attention in neural sequence models. Transactions of the Association for Computational Linguistics, 8:156–171, dec 2020. doi: 10.1162/tacl_a_00306. URL https://doi.org/10.1162%2Ftacl_a_00306.
- Proof artifact co-training for theorem proving with language models, 2022.
- A structural probe for finding syntax in word representations. In Jill Burstein, Christy Doran, and Thamar Solorio (eds.), Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4129–4138, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics. doi: 10.18653/v1/N19-1419. URL https://aclanthology.org/N19-1419.
- Stephen Cole Kleene. Representation of events in nerve nets and finite automata. 1951. URL https://api.semanticscholar.org/CorpusID:60577779.
- Jean-Louis Krivine. A call-by-name lambda-calculus machine. Higher Order Symbol. Comput., 20(3):199–207, sep 2007. ISSN 1388-3690. doi: 10.1007/s10990-007-9018-9. URL https://doi.org/10.1007/s10990-007-9018-9.
- Coderl: Mastering code generation through pretrained models and deep reinforcement learning, 2022.
- Inductive synthesis of structurally recursive functional programs from non-recursive expressions. Proc. ACM Program. Lang., 7(POPL), jan 2023. doi: 10.1145/3571263. URL https://doi.org/10.1145/3571263.
- CompCert - A Formally Verified Optimizing Compiler. In ERTS 2016: Embedded Real Time Software and Systems, 8th European Congress, Toulouse, France, January 2016. SEE. URL https://inria.hal.science/hal-01238879.
- Starcoder: may the source be with you!, 2023.
- Competition-level code generation with alphacode. Science, 378(6624):1092–1097, 2022a.
- Competition-level code generation with alphacode. DeepMind, 2022b.
- Transformers learn shortcuts to automata, 2022. URL https://arxiv.org/abs/2210.10749.
- Exposing attention glitches with flip-flop language modeling, 2023a.
- Code execution with pre-trained language models. In Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki (eds.), Findings of the Association for Computational Linguistics: ACL 2023, pp. 4984–4999, Toronto, Canada, July 2023b. Association for Computational Linguistics. doi: 10.18653/v1/2023.findings-acl.308. URL https://aclanthology.org/2023.findings-acl.308.
- Codexglue: A machine learning benchmark dataset for code understanding and generation. arXiv preprint arXiv:2102.04664, 2021.
- Changing data structures in type theory:a study of natural numbers. volume 2277, pp. 181–196, 12 2000. ISBN 978-3-540-43287-6. doi: 10.1007/3-540-45842-5_12.
- A logical calculus of the ideas immanent in nervous activity. The bulletin of mathematical biophysics, 5:115–133, 1943.
- Locating and editing factual associations in gpt, 2023.
- William Merrill. Formal languages and the nlp black box. In Developments in Language Theory: 27th International Conference, DLT 2023, Umeå, Sweden, June 12–16, 2023, Proceedings, pp. 1–8, Berlin, Heidelberg, 2023. Springer-Verlag. ISBN 978-3-031-33263-0. doi: 10.1007/978-3-031-33264-7_1. URL https://doi.org/10.1007/978-3-031-33264-7_1.
- Transformers can be expressed in first-order logic with majority, 2022. URL https://arxiv.org/abs/2210.02671.
- Synthesizing symmetric lenses. Proc. ACM Program. Lang., 3(ICFP), jul 2019. doi: 10.1145/3341699. URL https://doi.org/10.1145/3341699.
- Progress measures for grokking via mechanistic interpretability, 2023. URL https://arxiv.org/abs/2301.05217.
- Codegen: An open large language model for code with multi-turn program synthesis. arXiv preprint arXiv:2203.13474, 2022.
- Show your work: Scratchpads for intermediate computation with language models. arXiv preprint arXiv:2112.00114, 2021.
- Is self-repair a silver bullet for code generation?, 2023.
- OpenAI. Chatgpt: Optimizing language models for dialogue. https://openai.com/blog/chatgpt, 2023a.
- OpenAI. Gpt-4 technical report. https://arxiv.org/abs/2303.08774, 2023b.
- Type-and-example-directed program synthesis. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’15, pp. 619–630, New York, NY, USA, 2015. Association for Computing Machinery. ISBN 9781450334686. doi: 10.1145/2737924.2738007. URL https://doi.org/10.1145/2737924.2738007.
- Automatically correcting large language models: Surveying the landscape of diverse self-correction strategies, 2023.
- Can large language models reason about program invariants? In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett (eds.), Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pp. 27496–27520. PMLR, 23–29 Jul 2023. URL https://proceedings.mlr.press/v202/pei23a.html.
- Attention is turing-complete. Journal of Machine Learning Research, 22(75):1–35, 2021. URL http://jmlr.org/papers/v22/20-302.html.
- Formal algorithms for transformers, 2022.
- Benjamin C. Pierce et al. Software Foundations. 2024. URL https://softwarefoundations.cis.upenn.edu/. [Online; accessed 13-January-2024].
- Generative language modeling for automated theorem proving, 2020.
- Language models are unsupervised multitask learners. 2019.
- Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1):5485–5551, 2020.
- Talia Ringer. Proof Repair. PhD thesis, 2021.
- Proof repair across type equivalences. In Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation, PLDI 2021, pp. 112–127, New York, NY, USA, 2021. Association for Computing Machinery. ISBN 9781450383912. doi: 10.1145/3453483.3454033. URL https://doi.org/10.1145/3453483.3454033.
- ELIXIR: Effective object oriented program repair. In ASE, pp. 648–659, 2017.
- Generating correctness proofs with neural networks. In Proceedings of the 4th ACM SIGPLAN International Workshop on Machine Learning and Programming Languages, MAPL 2020, pp. 1–10, New York, NY, USA, 2020. Association for Computing Machinery. ISBN 9781450379960. doi: 10.1145/3394450.3397466. URL https://doi.org/10.1145/3394450.3397466.
- Passport: Improving automated formal verification using identifiers. ACM TOPLAS, 2023.
- Smith and Zipser. Encoding sequential structure: experience with the real-time recurrent learning algorithm. In International 1989 Joint Conference on Neural Networks, pp. 645–648 vol.1, 1989. doi: 10.1109/IJCNN.1989.118646.
- Lexecutor: Learning-guided execution. arXiv preprint arXiv:2302.02343, 2023.
- Transformers as recognizers of formal languages: A survey on expressivity, 2023.
- Effective attention sheds light on interpretability. In Chengqing Zong, Fei Xia, Wenjie Li, and Roberto Navigli (eds.), Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pp. 4126–4135, Online, August 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.findings-acl.361. URL https://aclanthology.org/2021.findings-acl.361.
- Intellicode compose: Code generation using transformer. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 1433–1443, 2020.
- Lamda: Language models for dialog applications. arXiv preprint arXiv:2201.08239, 2022.
- Mlregtest: A benchmark for the machine learning of regular languages, 2023.
- Attention is all you need. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (eds.), Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017. URL https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf.
- Analyzing the structure of attention in a transformer language model. In Tal Linzen, Grzegorz Chrupała, Yonatan Belinkov, and Dieuwke Hupkes (eds.), Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pp. 63–76, Florence, Italy, August 2019. Association for Computational Linguistics. doi: 10.18653/v1/W19-4808. URL https://aclanthology.org/W19-4808.
- Interpretability in the wild: a circuit for indirect object identification in GPT-2 small. In The Eleventh International Conference on Learning Representations, 2023a. URL https://openreview.net/forum?id=NpsVSN6o4ul.
- CodeT5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation. In Marie-Francine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih (eds.), Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 8696–8708, Online and Punta Cana, Dominican Republic, November 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.emnlp-main.685. URL https://aclanthology.org/2021.emnlp-main.685.
- Codet5+: Open code large language models for code understanding and generation. arXiv preprint arXiv:2305.07922, 2023b.
- Statistically meaningful approximation: a case study on approximating turing machines with transformers, 2023.
- Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837, 2022.
- Automated program repair in the era of large pre-trained language models. In Proceedings of the 45th International Conference on Software Engineering (ICSE 2023). Association for Computing Machinery, 2023.
- Wizardlm: Empowering large language models to follow complex instructions. arXiv preprint arXiv:2304.12244, 2023.
- A systematic evaluation of large language models of code. In Proceedings of the 6th ACM SIGPLAN International Symposium on Machine Programming, pp. 1–10, 2022.
- Byt5: Towards a token-free future with pre-trained byte-to-byte models. Transactions of the Association for Computational Linguistics, 10:291–306, 2022.
- Learning to prove theorems via interacting with proof assistants. In International Conference on Machine Learning, pp. 6984–6994. PMLR, 2019.
- Self-attention networks can process bounded hierarchical languages. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 3770–3785, Online, August 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.acl-long.292. URL https://aclanthology.org/2021.acl-long.292.
- Are transformers universal approximators of sequence-to-sequence functions? In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=ByxRM0Ntvr.
- Pointer value retrieval: A new benchmark for understanding the limits of neural network generalization, 2022a.
- On the paradox of learning to reason from data. In International Joint Conference on Artificial Intelligence, 2022b. URL https://api.semanticscholar.org/CorpusID:248986434.
- Self-edit: Fault-aware code editor for code generation, 2023.
- Unveiling transformers with lego: a synthetic reasoning task, 2022c. URL https://arxiv.org/abs/2206.04301.
- Codegeex: A pre-trained model for code generation with multilingual benchmarking on humaneval-x. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 5673–5684, 2023.