Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
157 tokens/sec
GPT-4o
8 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Towards Understanding What Code Language Models Learned (2306.11943v2)

Published 20 Jun 2023 in cs.SE, cs.CL, and cs.LG

Abstract: Pre-trained LLMs are effective in a variety of natural language tasks, but it has been argued their capabilities fall short of fully learning meaning or understanding language. To understand the extent to which LLMs can learn some form of meaning, we investigate their ability to capture semantics of code beyond superficial frequency and co-occurrence. In contrast to previous research on probing models for linguistic features, we study pre-trained models in a setting that allows for objective and straightforward evaluation of a model's ability to learn semantics. In this paper, we examine whether such models capture the semantics of code, which is precisely and formally defined. Through experiments involving the manipulation of code fragments, we show that code pre-trained models of code learn a robust representation of the computational semantics of code that goes beyond superficial features of form alone

Definition Search Book Streamline Icon: https://streamlinehq.com
References (38)
  1. Allamanis, M. The adverse effects of code duplication in machine learning models of code. In Proceedings of the 2019 ACM SIGPLAN International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software (2019), pp. 143–153.
  2. Climbing towards NLU: On meaning, form, and understanding in the age of data. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (Online, July 2020), Association for Computational Linguistics, pp. 5185–5198.
  3. Language models are few-shot learners. In Advances in Neural Information Processing Systems (2020), H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin, Eds., vol. 33, Curran Associates, Inc., pp. 1877–1901.
  4. Do programmers prefer predictable expressions in code? Cognitive science 44, 12 (2020), e12921.
  5. Natgen: generative pre-training by “naturalizing” source code. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (2022), pp. 18–30.
  6. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374 (2021).
  7. Metamorphic testing: A review of challenges and opportunities. ACM Computing Surveys (CSUR) 51, 1 (2018), 1–27.
  8. Palm: Scaling language modeling with pathways, 2022.
  9. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) (Minneapolis, Minnesota, June 2019), Association for Computational Linguistics, pp. 4171–4186.
  10. Amnesic probing: Behavioral explanation with amnesic counterfactuals. Transactions of the Association for Computational Linguistics 9 (2021), 160–175.
  11. Ettinger, A. What BERT is not: Lessons from a new suite of psycholinguistic diagnostics for language models. Transactions of the Association for Computational Linguistics 8 (2020), 34–48.
  12. CodeBERT: A pre-trained model for programming and natural languages. In Findings of the Association for Computational Linguistics: EMNLP 2020 (Online, Nov. 2020), Association for Computational Linguistics, pp. 1536–1547.
  13. Graphcode{bert}: Pre-training code representations with data flow. In International Conference on Learning Representations (2021).
  14. Semantic robustness of models of source code. In 2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER) (2022), IEEE, pp. 526–537.
  15. Codesearchnet challenge: Evaluating the state of semantic code search. arXiv preprint arXiv:1909.09436 (2019).
  16. Contrastive code representation learning. arXiv preprint arXiv:2007.04973 (2020).
  17. Evidence of meaning in language models trained on programs. arXiv preprint arXiv:2305.11169 (2023).
  18. Is BERT really robust? A strong baseline for natural language attack on text classification and entailment. In The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 7-12, 2020 (2020), AAAI Press, pp. 8018–8025.
  19. Learning and evaluating contextual embedding of source code. In Proceedings of the 37th International Conference on Machine Learning (13–18 Jul 2020), H. D. III and A. Singh, Eds., vol. 119 of Proceedings of Machine Learning Research, PMLR, pp. 5110–5121.
  20. What do pre-trained code models know about code? In 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE) (2021), IEEE, pp. 1332–1336.
  21. Negated and misprimed probes for pretrained language models: Birds can talk, but cannot fly. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (Online, July 2020), Association for Computational Linguistics, pp. 7811–7818.
  22. Roberta: A robustly optimized BERT pretraining approach. CoRR abs/1907.11692 (2019).
  23. Codexglue: A machine learning benchmark dataset for code understanding and generation. CoRR abs/2102.04664 (2021).
  24. Codegen2: Lessons for training llms on programming and natural languages. arXiv preprint arXiv:2305.02309 (2023).
  25. Training language models to follow instructions with human feedback, 2022.
  26. NOPE: A corpus of naturally-occurring presuppositions in English. In Proceedings of the 25th Conference on Computational Natural Language Learning (Online, Nov. 2021), Association for Computational Linguistics, pp. 349–366.
  27. Information-theoretic probing for linguistic structure. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (Online, July 2020), Association for Computational Linguistics, pp. 4609–4622.
  28. Language models are unsupervised multitask learners. OpenAI blog 1, 8 (2019), 9.
  29. A primer in BERTology: What we know about how BERT works. Transactions of the Association for Computational Linguistics 8 (2020), 842–866.
  30. Interpreting pretrained source-code models using neuron redundancy analyses. arXiv preprint arXiv:2305.00875 (2023).
  31. What do you learn from context? probing for sentence structure in contextualized word representations. In International Conference on Learning Representations (2019).
  32. Probing pretrained models of source code. arXiv preprint arXiv:2202.08975 (2022).
  33. Universal adversarial triggers for attacking and analyzing NLP. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) (Hong Kong, China, Nov. 2019), Association for Computational Linguistics, pp. 2153–2162.
  34. What do they capture? a structural analysis of pre-trained language models for source code. In Proceedings of the 44th International Conference on Software Engineering (2022), pp. 2377–2388.
  35. Chain of thought prompting elicits reasoning in large language models. CoRR abs/2201.11903 (2022).
  36. Wilcoxon, F. Individual comparisons by ranking methods. In Breakthroughs in statistics. Springer, 1992, pp. 196–202.
  37. Perturbed masking: Parameter-free probing for analyzing and interpreting BERT. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (Online, July 2020), Association for Computational Linguistics, pp. 4166–4176.
  38. Navigating the grey area: Expressions of overconfidence and uncertainty in language models. arXiv preprint arXiv:2302.13439 (2023).
Citations (4)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com