LLM4Decompile: Decompiling Binary Code with Large Language Models (2403.05286v3)
Abstract: Decompilation aims to convert binary code to high-level source code, but traditional tools like Ghidra often produce results that are difficult to read and execute. Motivated by the advancements in LLMs, we propose LLM4Decompile, the first and largest open-source LLM series (1.3B to 33B) trained to decompile binary code. We optimize the LLM training process and introduce the LLM4Decompile-End models to decompile binary directly. The resulting models significantly outperform GPT-4o and Ghidra on the HumanEval and ExeBench benchmarks by over 100% in terms of re-executability rate. Additionally, we improve the standard refinement approach to fine-tune the LLM4Decompile-Ref models, enabling them to effectively refine the decompiled code from Ghidra and achieve a further 16.2% improvement over the LLM4Decompile-End. LLM4Decompile demonstrates the potential of LLMs to revolutionize binary code decompilation, delivering remarkable improvements in readability and executability while complementing conventional tools for optimal results. Our code, dataset, and models are released at https://github.com/albertan017/LLM4Decompile
- Slade: A portable small language model decompiler for optimized assembler. CoRR, abs/2305.12520.
- Native x86 decompilation using semantics-preserving structural analysis and iterative control-flow structuring. In Proceedings of the 22th USENIX Security Symposium, Washington, DC, USA, August 14-16, 2013, pages 353–368. USENIX Association.
- Evaluating large language models trained on code. CoRR, abs/2107.03374.
- Modeling black-box components with probabilistic synthesis. In GPCE ’20: Proceedings of the 19th ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences, Virtual Event, USA, November 16-17, 2020, pages 1–14. ACM.
- ANGHABENCH: A suite with one million compilable C benchmarks for code-size reduction. In IEEE/ACM International Symposium on Code Generation and Optimization, CGO 2021, Seoul, South Korea, February 27 - March 3, 2021, pages 378–390. IEEE.
- Ghidra. 2024. Ghidra software reverse engineering framework.
- Deepseek-coder: When the large language model meets programming–the rise of code intelligence. arXiv preprint arXiv:2401.14196.
- Hex-Rays. 2024. Ida pro: a cross-platform multi-processor disassembler and debugger.
- Iman Hosseini and Brendan Dolan-Gavitt. 2022. Beyond the C: retargetable decompilation using neural machine translation. CoRR, abs/2212.08950.
- Nova++{}^{\mbox{+}}start_FLOATSUPERSCRIPT + end_FLOATSUPERSCRIPT: Generative language models for binaries. CoRR, abs/2311.13721.
- Using recurrent neural networks for decompilation. In 25th International Conference on Software Analysis, Evolution and Reengineering, SANER 2018, Campobasso, Italy, March 20-23, 2018, pages 346–356. IEEE Computer Society.
- DIRE: A neural approach to decompiled identifier naming. In 34th IEEE/ACM International Conference on Automated Software Engineering, ASE 2019, San Diego, CA, USA, November 11-15, 2019, pages 628–639. IEEE.
- Compiler validation via equivalence modulo inputs. In ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’14, Edinburgh, United Kingdom - June 09 - 11, 2014, pages 216–226. ACM.
- Thomas Lippincott. 2020. Starcoder: A general neural ensemble technique to support traditional scholarship, illustrated with a study of the post-atlantic slave trade. In 15th Annual International Conference of the Alliance of Digital Humanities Organizations, DH 2020, Ottawa, Canada, July 20-25, 2020, Conference Abstracts.
- Zhibo Liu and Shuai Wang. 2020. How far we have come: testing decompilation correctness of C decompilers. In ISSTA ’20: 29th ACM SIGSOFT International Symposium on Software Testing and Analysis, Virtual Event, USA, July 18-22, 2020, pages 475–487. ACM.
- Ilya Loshchilov and Frank Hutter. 2019. Decoupled weight decay regularization. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net.
- OpenAI. 2023. GPT-4 technical report. CoRR, abs/2303.08774.
- Code llama: Open foundation models for code. CoRR, abs/2308.12950.
- Sebastian Ruder. 2016. An overview of gradient descent optimization algorithms. arXiv preprint arXiv:1609.04747.
- Richard M Stallman et al. 2003. Using the gnu compiler collection. Free Software Foundation, 4(02).
- Codeflaws: a programming competition benchmark for evaluating automated program repair tools. In Proceedings of the 39th International Conference on Software Engineering, ICSE 2017, Buenos Aires, Argentina, May 20-28, 2017 - Companion Volume, pages 180–182. IEEE Computer Society.
- Attention is all you need. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, pages 5998–6008.
- A new algorithm for identifying loops in decompilation. In Static Analysis, 14th International Symposium, SAS 2007, Kongens Lyngby, Denmark, August 22-24, 2007, Proceedings, volume 4634 of Lecture Notes in Computer Science, pages 170–183. Springer.
- Huggingface’s transformers: State-of-the-art natural language processing. CoRR, abs/1910.03771.
- Refining decompiled C code with large language models. CoRR, abs/2310.06530.
- A systematic evaluation of large language models of code. In Proceedings of the 6th ACM SIGPLAN International Symposium on Machine Programming, MAPS 2022, page 1–10, New York, NY, USA. Association for Computing Machinery.
- Lmpa: Improving decompilation by synergy of large language model and program analysis. CoRR, abs/2306.02546.
- An extensive study on pre-trained models for program understanding and generation. In ISSTA ’22: 31st ACM SIGSOFT International Symposium on Software Testing and Analysis, Virtual Event, South Korea, July 18 - 22, 2022, pages 39–51. ACM.