Boosting Static Resource Leak Detection via LLM-based Resource-Oriented Intention Inference (2311.04448v4)
Abstract: Resource leaks, caused by resources not being released after acquisition, often lead to performance issues and system crashes. Existing static detection techniques rely on mechanical matching of predefined resource acquisition/release APIs and null-checking conditions to find unreleased resources, suffering from both (1) false negatives caused by the incompleteness of predefined resource acquisition/release APIs and (2) false positives caused by the incompleteness of resource reachability validation identification. To overcome these challenges, we propose InferROI, a novel approach that leverages the exceptional code comprehension capability of LLMs to directly infer resource-oriented intentions (acquisition, release, and reachability validation) in code. InferROI first prompts the LLM to infer involved intentions for a given code snippet, and then incorporates a two-stage static analysis approach to check control-flow paths for resource leak detection based on the inferred intentions. We evaluate the effectiveness of InferROI in both resource-oriented intention inference and resource leak detection. Experimental results on the DroidLeaks and JLeaks datasets demonstrate InferROI achieves promising bug detection rate (59.3% and 62.5%) and false alarm rate (18.6% and 19.5%). Compared to three industrial static detectors, InferROI detects 14~45 and 149~485 more bugs in DroidLeaks and JLeaks, respectively. When applied to real-world open-source projects, InferROI identifies 29 unknown resource leak bugs (verified by authors), with 7 of them being confirmed by developers. In addition, the results of an ablation study underscores the importance of combining LLM-based inference with static analysis.
- 2023. Best Practices for Prompt Engineering with OpenAI API. Retrieved December 1, 2023 from https://help.openai.com/en/articles/6654000-best-practices-for-prompt-engineering-with-openai-api
- 2023a. Code Diff. Retrieved July 31, 2023 from https://github.com/zxing/zxing/commit/de83fdf8060a4a75484a37b5fa1bb71e64852d1e
- 2023b. Code Inspection. Retrieved December 1, 2023 from https://www.jetbrains.com/help/idea/2016.3/code-inspection.html
- 2023. FindBugs. Retrieved December 1, 2023 from http://findbugs.sourceforge.net/
- 2023. GPT-4. Retrieved December 1, 2023 from https://platform.openai.com/docs/models/gpt-4
- 2023. Infer. Retrieved December 1, 2023 from https://fbinfer.com/
- 2023. Lint. Retrieved December 1, 2023 from https://developer.android.com/studio/write/lint
- 2023. PROGEX (Program Graph Extractor). Retrieved December 1, 2023 from https://github.com/ghaffarian/progex
- 2023. Replication Package. Retrieved December 1, 2023 from https://anonymous.4open.science/r/InferROI-Replication-5EF5/README.md
- 2023. selenium library. Retrieved December 1, 2023 from https://github.com/SeleniumHQ/selenium
- 2023. The try-with-resoures Statement (The Java Tutorials). Retrieved December 1, 2023 from https://docs.oracle.com/javase/tutorial/essential/exceptions/tryResourceClose.html
- SinkFinder: harvesting hundreds of unknown interesting function pairs with just one seed. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 1101–1113.
- Sparks of artificial general intelligence: Early experiments with gpt-4. arXiv preprint arXiv:2303.12712 (2023).
- Practical memory leak detection using guarded value-flow analysis. In Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation. 480–491.
- Template-Based Named Entity Recognition Using BART. In Findings of the Association for Computational Linguistics: ACL/IJCNLP 2021, Online Event, August 1-6, 2021 (Findings of ACL, Vol. ACL/IJCNLP 2021), Chengqing Zong, Fei Xia, Wenjie Li, and Roberto Navigli (Eds.). 1835–1845.
- Detecting Kernel Memory Leaks in Specialized Modules with Ownership Reasoning. In Proceedings of 28th Annual Network and Distributed System Security Symposium, NDSS 2021, virtually, February 21-25, 2021.
- Memory and resource leak defects and their repairs in Java projects. Empirical Software Engineering 25, 1 (2020), 678–718.
- Prompt-tuned code language model as a neural knowledge base for type inference in statically-typed partial code. In Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering. 1–13.
- Prompt-tuned Code Language Model as a Neural Knowledge Base for Type Inference in Statically-Typed Partial Code. In 37th IEEE/ACM International Conference on Automated Software Engineering, ASE 2022, Rochester, MI, USA, October 10-14, 2022. ACM, 79:1–79:13.
- How can we know what language models know? Transactions of the Association for Computational Linguistics 8 (2020), 423–438.
- Lightweight and modular resource leak verification. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 181–192.
- The Power of Scale for Parameter-Efficient Prompt Tuning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 7-11 November, 2021, Marie-Francine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih (Eds.). 3045–3059.
- PCA: memory leak detection using partial call-path analysis. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 1621–1625.
- Xiang Lisa Li and Percy Liang. 2021. Prefix-Tuning: Optimizing Continuous Prompts for Generation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL/IJCNLP 2021, (Volume 1: Long Papers), Virtual Event, August 1-6, 2021, Chengqing Zong, Fei Xia, Wenjie Li, and Roberto Navigli (Eds.). 4582–4597.
- Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing. ACM Comput. Surv. 55, 9 (2023), 195:1–195:35.
- Droidleaks: a comprehensive database of resource leaks in android apps. Empirical Software Engineering 24, 6 (2019), 3435–3483.
- Understanding and detecting wake lock misuses for android applications. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. 396–409.
- OpenAI. 2023. GPT-4 Technical Report. arXiv:2303.08774 [cs.CL]
- Language models as knowledge bases? 2463–2473.
- Language Models as Knowledge Bases?. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3-7, 2019. 2463–2473.
- Hector: Detecting resource-release omission faults in error-handling code for systems software. In Proceedings of 2013 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). 1–12.
- Emina Torlak and Satish Chandra. 2010. Effective interprocedural resource leak detection. In Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering. 535–544.
- LLaMA: Open and Efficient Foundation Language Models. arXiv:2302.13971 [cs.CL]
- Llama 2: Open Foundation and Fine-Tuned Chat Models. arXiv:2307.09288 [cs.CL]
- Towards Verifying Android Apps for the Absence of {{\{{No-Sleep}}\}} Energy Bugs. In 2012 Workshop on Power-Aware Computing and Systems (HotPower 12).
- No more fine-tuning? an experimental evaluation of prompt tuning in code intelligence. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2022, Singapore, Singapore, November 14-18, 2022, Abhik Roychoudhury, Cristian Cadar, and Miryung Kim (Eds.). ACM, 382–394. https://doi.org/10.1145/3540250.3549113
- Relda2: An effective static analysis tool for resource leak detection in Android apps. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering. 762–767.
- Light-weight, inter-procedural and callback-aware resource leak detection for android apps. IEEE Transactions on Software Engineering 42, 11 (2016), 1054–1076.
- Are code examples on an online Q&A forum reliable?: a study of API misuse on stack overflow. In Proceedings of 40th IEEE/ACM International Conference on Software Engineering. 886–896.