Can Language Models Pretend Solvers? Logic Code Simulation with LLMs (2403.16097v2)
Abstract: Transformer-based LLMs have demonstrated significant potential in addressing logic problems. capitalizing on the great capabilities of LLMs for code-related activities, several frameworks leveraging logical solvers for logic reasoning have been proposed recently. While existing research predominantly focuses on viewing LLMs as natural language logic solvers or translators, their roles as logic code interpreters and executors have received limited attention. This study delves into a novel aspect, namely logic code simulation, which forces LLMs to emulate logical solvers in predicting the results of logical programs. To further investigate this novel task, we formulate our three research questions: Can LLMs efficiently simulate the outputs of logic codes? What strength arises along with logic code simulation? And what pitfalls? To address these inquiries, we curate three novel datasets tailored for the logic code simulation task and undertake thorough experiments to establish the baseline performance of LLMs in code simulation. Subsequently, we introduce a pioneering LLM-based code simulation technique, Dual Chains of Logic (DCoL). This technique advocates a dual-path thinking approach for LLMs, which has demonstrated state-of-the-art performance compared to other LLM prompt strategies, achieving a notable improvement in accuracy by 7.06% with GPT-4-Turbo.
- L. De Moura and N. Bjørner, “Z3: An efficient smt solver,” in International conference on Tools and Algorithms for the Construction and Analysis of Systems. Springer, 2008, pp. 337–340.
- H. Barbosa, C. Barrett, M. Brain, G. Kremer, H. Lachnitt, M. Mann, A. Mohamed, M. Mohamed, A. Niemetz, A. Nötzli et al., “cvc5: A versatile and industrial-strength smt solver,” in International Conference on Tools and Algorithms for the Construction and Analysis of Systems. Springer, 2022, pp. 415–442.
- L. Cordeiro and B. Fischer, “Verifying multi-threaded software using smt-based context-bounded model checking,” in Proceedings of the 33rd International Conference on Software Engineering, 2011, pp. 331–340.
- G. Soltana, M. Sabetzadeh, and L. C. Briand, “Practical constraint solving for generating system test data,” ACM Transactions on Software Engineering and Methodology (TOSEM), vol. 29, no. 2, pp. 1–48, 2020.
- E. Kang, S. Lafortune, and S. Tripakis, “Automated synthesis of secure platform mappings,” in Computer Aided Verification: 31st International Conference, CAV 2019, New York City, NY, USA, July 15-18, 2019, Proceedings, Part I 31. Springer, 2019, pp. 219–237.
- M. R. Gadelha, E. Steffinlongo, L. C. Cordeiro, B. Fischer, and D. Nicole, “Smt-based refutation of spurious bug reports in the clang static analyzer,” in 2019 IEEE/ACM 41st International Conference on Software Engineering: Companion Proceedings (ICSE-Companion). IEEE, 2019, pp. 11–14.
- S. Cai, B. Li, and X. Zhang, “Local search for smt on linear integer arithmetic,” in International Conference on Computer Aided Verification. Springer, 2022, pp. 227–248.
- T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell et al., “Language models are few-shot learners,” Advances in neural information processing systems, vol. 33, pp. 1877–1901, 2020.
- H. Touvron, L. Martin, K. Stone, P. Albert, A. Almahairi, Y. Babaei, N. Bashlykov, S. Batra, P. Bhargava, S. Bhosale et al., “Llama 2: Open foundation and fine-tuned chat models,” arXiv preprint arXiv:2307.09288, 2023.
- L. Pan, A. Albalak, X. Wang, and W. Wang, “Logic-lm: Empowering large language models with symbolic solvers for faithful logical reasoning,” in Findings of the Association for Computational Linguistics: EMNLP 2023, 2023, pp. 3806–3824.
- J. Lee and W. Hwang, “Symba: Symbolic backward chaining for multi-step natural language reasoning,” arXiv preprint arXiv:2402.12806, 2024.
- J. Feng, R. Xu, J. Hao, H. Sharma, Y. Shen, D. Zhao, and W. Chen, “Language models can be logical solvers,” arXiv preprint arXiv:2311.06158, 2023.
- O. Tafjord, B. Dalvi, and P. Clark, “Proofwriter: Generating implications, proofs, and abductive statements over natural language,” in Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, 2021, pp. 3621–3634.
- A. Saparov and H. He, “Language models are greedy reasoners: A systematic formal analysis of chain-of-thought,” in The Eleventh International Conference on Learning Representations, 2023. [Online]. Available: https://openreview.net/forum?id=qFVVBzXxR2V
- S. Han, H. Schoelkopf, Y. Zhao, Z. Qi, M. Riddell, L. Benson, L. Sun, E. Zubova, Y. Qiao, M. Burtell et al., “Folio: Natural language reasoning with first-order logic,” arXiv preprint arXiv:2209.00840, 2022.
- W. Zhong, S. Wang, D. Tang, Z. Xu, D. Guo, Y. Chen, J. Wang, J. Yin, M. Zhou, and N. Duan, “Analytical reasoning of text,” in Findings of the Association for Computational Linguistics: NAACL 2022, 2022, pp. 2306–2319.
- Z. Li, Y. Cao, X. Xu, J. Jiang, X. Liu, Y. S. Teo, S.-w. Lin, and Y. Liu, “Llms for relational reasoning: How far are we?” arXiv preprint arXiv:2401.09042, 2024.
- S. Zhang, X. Gu, Y. Chen, and B. Shen, “Infere: Step-by-step regex generation via chain of inference,” in 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 2023, pp. 1505–1515.
- M. Liu, T. Yang, Y. Lou, X. Du, Y. Wang, and X. Peng, “Codegen4libs: A two-stage approach for library-oriented code generation,” in 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 2023, pp. 434–445.
- H. Yu, B. Shen, D. Ran, J. Zhang, Q. Zhang, Y. Ma, G. Liang, Y. Li, Q. Wang, and T. Xie, “Codereval: A benchmark of pragmatic code generation with generative pre-trained models,” in Proceedings of the 46th IEEE/ACM International Conference on Software Engineering, 2024, pp. 1–12.
- L. Ma, W. Yang, B. Xu, S. Jiang, B. Fei, J. Liang, M. Zhou, and Y. Xiao, “Knowlog: Knowledge enhanced pre-trained language model for log understanding,” in Proceedings of the 46th IEEE/ACM International Conference on Software Engineering, 2024, pp. 1–13.
- J. Xu, Z. Cui, Y. Zhao, X. Zhang, S. He, P. He, L. Li, Y. Kang, Q. Lin, Y. Dang et al., “Unilog: Automatic logging via llm and in-context learning,” in Proceedings of the 46th IEEE/ACM International Conference on Software Engineering, 2024, pp. 1–12.
- C. Wang, Y. Lou, J. Liu, and X. Peng, “Generating variable explanations via zero-shot prompt learning,” in 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 2023, pp. 748–760.
- P. Gupta, A. Khare, Y. Bajpai, S. Chakraborty, S. Gulwani, A. Kanade, A. Radhakrishna, G. Soares, and A. Tiwari, “Grace: Language models meet code edits,” in Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2023, pp. 1483–1495.
- M. Sun, Y. Yang, Y. Wang, M. Wen, H. Jia, and Y. Zhou, “Smt solver validation empowered by large pre-trained language models,” in 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 2023, pp. 1288–1300.
- Y. Deng, C. S. Xia, C. Yang, S. D. Zhang, S. Yang, and L. Zhang, “Large language models are edge-case generators: Crafting unusual programs for fuzzing deep learning libraries,” in Proceedings of the 46th IEEE/ACM International Conference on Software Engineering, 2024, pp. 1–13.
- A. Z. Yang, C. Le Goues, R. Martins, and V. Hellendoorn, “Large language models for test-free fault localization,” in Proceedings of the 46th IEEE/ACM International Conference on Software Engineering, 2024, pp. 1–12.
- Y. Sun, D. Wu, Y. Xue, H. Liu, H. Wang, Z. Xu, X. Xie, and Y. Liu, “Gptscan: Detecting logic vulnerabilities in smart contracts by combining gpt with program analysis,” Proc. IEEE/ACM ICSE, 2024.
- E. La Malfa, C. Weinhuber, O. Torre, F. Lin, A. Cohn, N. Shadbolt, and M. Wooldridge, “Code simulation challenges for large language models,” arXiv preprint arXiv:2401.09074, 2024.
- J. Wei, X. Wang, D. Schuurmans, M. Bosma, F. Xia, E. Chi, Q. V. Le, D. Zhou et al., “Chain-of-thought prompting elicits reasoning in large language models,” Advances in neural information processing systems, vol. 35, pp. 24 824–24 837, 2022.
- D. Zhou, N. Schärli, L. Hou, J. Wei, N. Scales, X. Wang, D. Schuurmans, C. Cui, O. Bousquet, Q. V. Le, and E. H. Chi, “Least-to-most prompting enables complex reasoning in large language models,” in The Eleventh International Conference on Learning Representations, 2023. [Online]. Available: https://openreview.net/forum?id=WZH7099tgfM
- Q. Lyu, S. Havaldar, A. Stein, L. Zhang, D. Rao, E. Wong, M. Apidianaki, and C. Callison-Burch, “Faithful chain-of-thought reasoning,” in Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), 2023, pp. 305–329.
- Z. Ling, Y. Fang, X. Li, Z. Huang, M. Lee, R. Memisevic, and H. Su, “Deductive verification of chain-of-thought reasoning,” Advances in Neural Information Processing Systems, vol. 36, 2024.
- T. Olausson, A. Gu, B. Lipkin, C. Zhang, A. Solar-Lezama, J. Tenenbaum, and R. Levy, “Linc: A neurosymbolic approach for logical reasoning by combining language models with first-order logic provers,” in Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023, pp. 5153–5176.
- X. Ye, Q. Chen, I. Dillig, and G. Durrett, “Satlm: Satisfiability-aided language models using declarative prompting,” Advances in Neural Information Processing Systems, vol. 36, 2024.
- Y. Zhang, H.-L. Zhen, Z. Pei, Y. Lian, L. Yin, M. Yuan, and B. Yu, “Sola: Solver-layer adaption of llm for better logic reasoning,” arXiv preprint arXiv:2402.11903, 2024.
- J. Pérez, P. Barceló, and J. Marinkovic, “Attention is turing-complete,” Journal of Machine Learning Research, vol. 22, no. 75, pp. 1–35, 2021.
- C. Wei, Y. Chen, and T. Ma, “Statistically meaningful approximation: a case study on approximating turing machines with transformers,” Advances in Neural Information Processing Systems, vol. 35, pp. 12 071–12 083, 2022.
- A. Giannou, S. Rajput, J.-y. Sohn, K. Lee, J. D. Lee, and D. Papailiopoulos, “Looped transformers as programmable computers,” in International Conference on Machine Learning. PMLR, 2023, pp. 11 398–11 442.
- D. Schuurmans, “Memory augmented large language models are computationally universal,” arXiv preprint arXiv:2301.04589, 2023.
- G. Kim, P. Baldi, and S. McAleer, “Language models can solve computer tasks,” Advances in Neural Information Processing Systems, vol. 36, 2024.
- C. Liu, S. Lu, W. Chen, D. Jiang, A. Svyatkovskiy, S. Fu, N. Sundaresan, and N. Duan, “Code execution with pre-trained language models,” in Findings of the Association for Computational Linguistics: ACL 2023, 2023, pp. 4984–4999.
- Z. Shi, M. Li, Y. Liu, S. Khan, J. Huang, H.-L. Zhen, M. Yuan, and Q. Xu, “Satformer: Transformer-based unsat core learning,” in 2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD). IEEE, 2023, pp. 1–4.
- B. Roziere, J. Gehring, F. Gloeckle, S. Sootla, I. Gat, X. E. Tan, Y. Adi, J. Liu, T. Remez, J. Rapin et al., “Code llama: Open foundation models for code,” arXiv preprint arXiv:2308.12950, 2023.
- D. Beyer, “Competition on software verification and witness validation: Sv-comp 2023,” in International Conference on Tools and Algorithms for the Construction and Analysis of Systems. Springer, 2023, pp. 495–522.
- L. Wang, W. Xu, Y. Lan, Z. Hu, Y. Lan, R. K.-W. Lee, and E.-P. Lim, “Plan-and-solve prompting: Improving zero-shot chain-of-thought reasoning by large language models,” in Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023, pp. 2609–2634.
- X. Wang, J. Wei, D. Schuurmans, Q. V. Le, E. H. Chi, S. Narang, A. Chowdhery, and D. Zhou, “Self-consistency improves chain of thought reasoning in language models,” in The Eleventh International Conference on Learning Representations, 2022.
- S. Min, X. Lyu, A. Holtzman, M. Artetxe, M. Lewis, H. Hajishirzi, and L. Zettlemoyer, “Rethinking the role of demonstrations: What makes in-context learning work?” in Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, Y. Goldberg, Z. Kozareva, and Y. Zhang, Eds. Abu Dhabi, United Arab Emirates: Association for Computational Linguistics, Dec. 2022, pp. 11 048–11 064. [Online]. Available: https://aclanthology.org/2022.emnlp-main.759
- X. Chen, R. A. Chi, X. Wang, and D. Zhou, “Premise order matters in reasoning with large language models,” arXiv preprint arXiv:2402.08939, 2024.
- Minyu Chen (4 papers)
- Guoqiang Li (38 papers)
- Ling-I Wu (3 papers)
- Ruibang Liu (5 papers)
- Yuxin Su (37 papers)
- Xi Chang (2 papers)
- Jianxin Xue (1 paper)