Emergent Mind

Abstract

Code translation aims to convert source code from one programming language (PL) to another. Given the promising abilities of LLMs in code synthesis, researchers are exploring their potential to automate code translation. The prerequisite for advancing the state of LLM-based code translation is to understand their promises and limitations over existing techniques. To that end, we present a large-scale empirical study to investigate the ability of general LLMs and code LLMs for code translation across pairs of different languages, including C, C++, Go, Java, and Python. Our study, which involves the translation of 1,700 code samples from three benchmarks and two real-world projects, reveals that LLMs are yet to be reliably used to automate code translation -- with correct translations ranging from 2.1% to 47.3% for the studied LLMs. Further manual investigation of unsuccessful translations identifies 15 categories of translation bugs. We also compare LLM-based code translation with traditional non-LLM-based approaches. Our analysis shows that these two classes of techniques have their own strengths and weaknesses. Finally, insights from our study suggest that providing more context to LLMs during translation can help them produce better results. To that end, we propose a prompt-crafting approach based on the symptoms of erroneous translations; this improves the performance of LLM-based code translation by 5.5% on average. Our study is the first of its kind, in terms of scale and breadth, that provides insights into the current limitations of LLMs in code translation and opportunities for improving them. Our dataset -- consisting of 1,700 code samples in five PLs with 10K+ tests, 43K+ translated code, 1,748 manually labeled bugs, and 1,365 bug-fix pairs -- can help drive research in this area.

We're not able to analyze this paper right now due to high demand.

Please check back later (sorry!).

Generate a detailed summary of this paper with a premium account.

We ran into a problem analyzing this paper.

Subscribe by Email

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.

References
  1. Upgrading GitHub from Rails 3.2 to 5.2. https://github.blog/2018-09-28-upgrading-github-from-rails-3-2-to-5-2/.

  2. 2020a. Supporting Linux kernel development in Rust. https://lwn.net/Articles/829858/.

  3. 2020b. Transform monolithic Java applications into microservices with the power of AI. https://developer.ibm.com/tutorials/transform-monolithic-java-applications-into-microservices-with-the-power-of-ai/.

  4. 2020c. Will code move on to a language such as Rust? https://www.theregister.com/2020/06/30/hard_to_find_linux_maintainers_says_torvalds/.

  5. GitHub’s Journey from Monolith to Microservices. https://www.infoq.com/articles/github-monolith-microservices/.

  6. Apache Commons CLI. https://commons.apache.org/proper/commons-cli/.

  7. Artifact Website. https://github.com/Intelligent-CAT-Lab/PLTranslationEmpirical.

  8. C to Go Translator. https://github.com/gotranspile/cxgo.

  9. C2Rust Transpiler. https://github.com/immunant/c2rust.

  10. Click. https://click.palletsprojects.com/en/8.1.x/.

  11. CodeGeeX. https://github.com/THUDM/CodeGeeX/blob/main/tests/test_prompt.txt.

  12. GPT-4 Technical Report. https://cdn.openai.com/papers/gpt-4.pdf.

  13. Hugging Face Open LLM Leaderboard. https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard.

  14. Java 2 CSharp Translator for Eclipse. https://sourceforge.net/projects/j2cstranslator/.

  15. Java to CSharp Converter. https://github.com/paulirwin/JavaToCSharp.

  16. Llama-2. https://ai.meta.com/research/publications/llama-2-open-foundation-and-fine-tuned-chat-models/.

  17. pycompile—Compile Python source files. https://docs.python.org/3/library/pycompile.html.

  18. Sharpen - Automated Java-¿C# coversion. https://github.com/mono/sharpen.

  19. TheBloke Airoboros 13B. https://huggingface.co/TheBloke/airoboros-13B-HF.

  20. TheBloke Wizard Vicuna 13B. https://huggingface.co/TheBloke/Wizard-Vicuna-13B-Uncensored-HF.

  21. TIOBE Index. https://www.tiobe.com/tiobe-index/.

  22. On Codex Prompt Engineering for OCL Generation: An Empirical Study
  23. AVATAR: A Parallel Corpus for Java-Python Program Translation
  24. SantaCoder: don't reach for the stars!
  25. Boris Beizer. 1990. Software testing techniques.
  26. Migrating legacy software to the cloud with ARTIST. In 2013 17th European Conference on Software Maintenance and Reengineering. IEEE, 465–468.
  27. Evaluating Large Language Models Trained on Code
  28. Exploring Data Augmentation for Code Generation Tasks
  29. Tree-to-tree neural networks for program translation. Advances in neural information processing systems 31 (2018).
  30. Legacy web application modernization by generating a REST service layer. IEEE Latin America Transactions 13, 7 (2015), 2379–2383.
  31. Aryaz Eghbali and Michael Pradel. 2022. CrystalBLEU: precisely and efficiently measuring the similarity of code. In Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering. 1–12.
  32. CodeBERT: A Pre-Trained Model for Programming and Natural Languages
  33. Automatic Software Repair: A Survey. IEEE Transactions on Software Engineering 45, 01 (jan 2019), 34–67. https://doi.org/10.1109/TSE.2017.2755013
  34. Challenges in migrating legacy software systems to the cloud—an empirical study. Information Systems 67 (2017), 100–113.
  35. ADELT: Transpilation Between Deep Learning Frameworks
  36. Migrating monoliths to microservices-based customizable multi-tenant cloud-native apps. In 2021 47th Euromicro Conference on Software Engineering and Advanced Applications (SEAA). IEEE, 170–177.
  37. End-to-End Training for Back-Translation with Categorical Reparameterization Trick
  38. Jaemin Hong. 2023. Improving Automatic C-to-Rust Translation with Static Analysis. In 2023 IEEE/ACM 45th International Conference on Software Engineering: Companion Proceedings (ICSE-Companion). IEEE, 273–277.
  39. Universal Language Model Fine-tuning for Text Classification
  40. A comprehensive study on deep learning bug characteristics. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 510–520.
  41. Repairing deep neural networks: Fix patterns and challenges. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering. 1135–1146.
  42. Adaptive mixtures of local experts. Neural computation 3, 1 (1991), 79–87.
  43. CoTran: An LLM-based Code Translator using Reinforcement Learning with Feedback from Compiler and Symbolic Execution
  44. Self-planning Code Generation with Large Language Models
  45. Repair is nearly generation: Multilingual program repair with llms. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 37. 5131–5140.
  46. Mono2micro: a practical and effective tool for decomposing monolithic java applications to microservices. In Proceedings of the 29th ACM joint meeting on European software engineering conference and symposium on the foundations of software engineering. 1214–1224.
  47. Phrase-based statistical translation of programming languages. In Proceedings of the 2014 ACM International Symposium on New Ideas, New Paradigms, and Reflections on Programming & Software. 173–184.
  48. Justas Kazanavičius and Dalius Mažeika. 2019. Migrating legacy software to microservices architecture. In 2019 Open Conference of Electrical, Electronic and Information Sciences (eStream). IEEE, 1–5.
  49. Transforming monolithic applications to microservices with Mono2Micro. In Proceedings of the 36th IEEE/ACM International Conference on Automated Software Engineering. 3–3.
  50. DOBF: A deobfuscation pre-training objective for programming languages. Advances in Neural Information Processing Systems 34 (2021), 14967–14979.
  51. StarCoder: may the source be with you!
  52. Syntax and Domain Aware Model for Unsupervised Program Translation
  53. Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation
  54. Lexical statistical machine translation for language migration. In Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering. 651–654.
  55. Migrating code with statistical machine translation. In Companion Proceedings of the 36th International Conference on Software Engineering. 544–547.
  56. Divide-and-conquer approach for multi-phase statistical migration for source code (t). In 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 585–596.
  57. CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis. In The Eleventh International Conference on Learning Representations.
  58. CARGO: ai-guided dependency analysis for migrating monolithic applications to microservices architecture. In Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering. 1–12.
  59. GPT-4 Technical Report
  60. Hongyu Pei Breivold. 2020. Towards factories of the future: migration of industrial legacy automation systems in the cloud computing and Internet-of-things context. Enterprise Information Systems 14, 4 (2020), 542–562.
  61. Software modernization to embrace quantum technology. Advances in Engineering Software 151 (2021), 102933.
  62. CodeNet: A Large-Scale AI for Code Dataset for Learning a Diversity of Coding Tasks. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2).
  63. CodeBLEU: a Method for Automatic Evaluation of Code Synthesis
  64. Unsupervised translation of programming languages. Advances in Neural Information Processing Systems 33 (2020), 20601–20611.
  65. Leveraging Automated Unit Tests for Unsupervised Code Translation
  66. BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
  67. Rajaraajeswari Settu and Pethuru Raj. 2013. Cloud application modernization and migration methodology. Cloud Computing: Methods and Practical Approaches (2013), 243–271.
  68. MUFIN: Improving Neural Repair Models with Back-Translation
  69. Application of back-translation: a transfer learning approach to identify ambiguous software requirements. In Proceedings of the 2021 ACM Southeast Conference. 130–137.
  70. How to fine-tune bert for text classification?. In Chinese Computational Linguistics: 18th China National Conference, CCL 2019, Kunming, China, October 18–20, 2019, Proceedings 18. Springer, 194–206.
  71. TransCoder: Towards Unified Transferable Code Representation Learning Inspired by Human Skills
  72. Code Translation with Compiler Representations
  73. Johannes Thönes. 2015. Microservices. IEEE software 32, 1 (2015), 116–116.
  74. CodeStylist: A System for Performing Code Style Transfer Using Neural Networks. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 37. 16485–16487.
  75. Expectation vs. experience: Evaluating the usability of code generation tools powered by large language models. In Chi conference on human factors in computing systems extended abstracts. 1–7.
  76. CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation
  77. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems 35 (2022), 24824–24837.
  78. Perfection not required? Human-AI partnerships in code translation. In 26th International Conference on Intelligent User Interfaces. 402–412.
  79. Better together? an evaluation of ai-supported code translation. In 27th International Conference on Intelligent User Interfaces. 369–391.
  80. BabelTower: Learning to Auto-parallelized Program Translation. In International Conference on Machine Learning. PMLR, 23685–23700.
  81. A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT
  82. Practical Program Repair in the Era of Large Pre-trained Language Models
  83. Chunqiu Steven Xia and Lingming Zhang. 2022. Less training, more repairing please: revisiting automated program repair via zero-shot learning. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 959–971.
  84. Conversational Automated Program Repair
  85. A systematic evaluation of large language models of code. In Proceedings of the 6th ACM SIGPLAN International Symposium on Machine Programming. 1–10.
  86. Tree of Thoughts: Deliberate Problem Solving with Large Language Models
  87. A Survey of Learning-based Automated Program Repair
  88. Migrating legacy applications to the service Cloud. In Proceedings of the 14th Conference Companion on Object Oriented Programming Systems Languages and Applications. 59–68.
  89. An empirical study on TensorFlow program bugs. In Proceedings of the 27th ACM SIGSOFT international symposium on software testing and analysis. 129–140.
  90. CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual Evaluations on HumanEval-X
  91. On Robustness of Prompt-based Semantic Parsing with Large Pre-trained Language Model: An Empirical Study on Codex

Show All 91