Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Boosting Static Resource Leak Detection via LLM-based Resource-Oriented Intention Inference (2311.04448v4)

Published 8 Nov 2023 in cs.SE

Abstract: Resource leaks, caused by resources not being released after acquisition, often lead to performance issues and system crashes. Existing static detection techniques rely on mechanical matching of predefined resource acquisition/release APIs and null-checking conditions to find unreleased resources, suffering from both (1) false negatives caused by the incompleteness of predefined resource acquisition/release APIs and (2) false positives caused by the incompleteness of resource reachability validation identification. To overcome these challenges, we propose InferROI, a novel approach that leverages the exceptional code comprehension capability of LLMs to directly infer resource-oriented intentions (acquisition, release, and reachability validation) in code. InferROI first prompts the LLM to infer involved intentions for a given code snippet, and then incorporates a two-stage static analysis approach to check control-flow paths for resource leak detection based on the inferred intentions. We evaluate the effectiveness of InferROI in both resource-oriented intention inference and resource leak detection. Experimental results on the DroidLeaks and JLeaks datasets demonstrate InferROI achieves promising bug detection rate (59.3% and 62.5%) and false alarm rate (18.6% and 19.5%). Compared to three industrial static detectors, InferROI detects 14~45 and 149~485 more bugs in DroidLeaks and JLeaks, respectively. When applied to real-world open-source projects, InferROI identifies 29 unknown resource leak bugs (verified by authors), with 7 of them being confirmed by developers. In addition, the results of an ablation study underscores the importance of combining LLM-based inference with static analysis.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (39)
  1. 2023. Best Practices for Prompt Engineering with OpenAI API. Retrieved December 1, 2023 from https://help.openai.com/en/articles/6654000-best-practices-for-prompt-engineering-with-openai-api
  2. 2023a. Code Diff. Retrieved July 31, 2023 from https://github.com/zxing/zxing/commit/de83fdf8060a4a75484a37b5fa1bb71e64852d1e
  3. 2023b. Code Inspection. Retrieved December 1, 2023 from https://www.jetbrains.com/help/idea/2016.3/code-inspection.html
  4. 2023. FindBugs. Retrieved December 1, 2023 from http://findbugs.sourceforge.net/
  5. 2023. GPT-4. Retrieved December 1, 2023 from https://platform.openai.com/docs/models/gpt-4
  6. 2023. Infer. Retrieved December 1, 2023 from https://fbinfer.com/
  7. 2023. Lint. Retrieved December 1, 2023 from https://developer.android.com/studio/write/lint
  8. 2023. PROGEX (Program Graph Extractor). Retrieved December 1, 2023 from https://github.com/ghaffarian/progex
  9. 2023. Replication Package. Retrieved December 1, 2023 from https://anonymous.4open.science/r/InferROI-Replication-5EF5/README.md
  10. 2023. selenium library. Retrieved December 1, 2023 from https://github.com/SeleniumHQ/selenium
  11. 2023. The try-with-resoures Statement (The Java Tutorials). Retrieved December 1, 2023 from https://docs.oracle.com/javase/tutorial/essential/exceptions/tryResourceClose.html
  12. SinkFinder: harvesting hundreds of unknown interesting function pairs with just one seed. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 1101–1113.
  13. Sparks of artificial general intelligence: Early experiments with gpt-4. arXiv preprint arXiv:2303.12712 (2023).
  14. Practical memory leak detection using guarded value-flow analysis. In Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation. 480–491.
  15. Template-Based Named Entity Recognition Using BART. In Findings of the Association for Computational Linguistics: ACL/IJCNLP 2021, Online Event, August 1-6, 2021 (Findings of ACL, Vol. ACL/IJCNLP 2021), Chengqing Zong, Fei Xia, Wenjie Li, and Roberto Navigli (Eds.). 1835–1845.
  16. Detecting Kernel Memory Leaks in Specialized Modules with Ownership Reasoning. In Proceedings of 28th Annual Network and Distributed System Security Symposium, NDSS 2021, virtually, February 21-25, 2021.
  17. Memory and resource leak defects and their repairs in Java projects. Empirical Software Engineering 25, 1 (2020), 678–718.
  18. Prompt-tuned code language model as a neural knowledge base for type inference in statically-typed partial code. In Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering. 1–13.
  19. Prompt-tuned Code Language Model as a Neural Knowledge Base for Type Inference in Statically-Typed Partial Code. In 37th IEEE/ACM International Conference on Automated Software Engineering, ASE 2022, Rochester, MI, USA, October 10-14, 2022. ACM, 79:1–79:13.
  20. How can we know what language models know? Transactions of the Association for Computational Linguistics 8 (2020), 423–438.
  21. Lightweight and modular resource leak verification. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 181–192.
  22. The Power of Scale for Parameter-Efficient Prompt Tuning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 7-11 November, 2021, Marie-Francine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih (Eds.). 3045–3059.
  23. PCA: memory leak detection using partial call-path analysis. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 1621–1625.
  24. Xiang Lisa Li and Percy Liang. 2021. Prefix-Tuning: Optimizing Continuous Prompts for Generation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL/IJCNLP 2021, (Volume 1: Long Papers), Virtual Event, August 1-6, 2021, Chengqing Zong, Fei Xia, Wenjie Li, and Roberto Navigli (Eds.). 4582–4597.
  25. Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing. ACM Comput. Surv. 55, 9 (2023), 195:1–195:35.
  26. Droidleaks: a comprehensive database of resource leaks in android apps. Empirical Software Engineering 24, 6 (2019), 3435–3483.
  27. Understanding and detecting wake lock misuses for android applications. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. 396–409.
  28. OpenAI. 2023. GPT-4 Technical Report. arXiv:2303.08774 [cs.CL]
  29. Language models as knowledge bases? 2463–2473.
  30. Language Models as Knowledge Bases?. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3-7, 2019. 2463–2473.
  31. Hector: Detecting resource-release omission faults in error-handling code for systems software. In Proceedings of 2013 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). 1–12.
  32. Emina Torlak and Satish Chandra. 2010. Effective interprocedural resource leak detection. In Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering. 535–544.
  33. LLaMA: Open and Efficient Foundation Language Models. arXiv:2302.13971 [cs.CL]
  34. Llama 2: Open Foundation and Fine-Tuned Chat Models. arXiv:2307.09288 [cs.CL]
  35. Towards Verifying Android Apps for the Absence of {{\{{No-Sleep}}\}} Energy Bugs. In 2012 Workshop on Power-Aware Computing and Systems (HotPower 12).
  36. No more fine-tuning? an experimental evaluation of prompt tuning in code intelligence. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2022, Singapore, Singapore, November 14-18, 2022, Abhik Roychoudhury, Cristian Cadar, and Miryung Kim (Eds.). ACM, 382–394. https://doi.org/10.1145/3540250.3549113
  37. Relda2: An effective static analysis tool for resource leak detection in Android apps. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering. 762–767.
  38. Light-weight, inter-procedural and callback-aware resource leak detection for android apps. IEEE Transactions on Software Engineering 42, 11 (2016), 1054–1076.
  39. Are code examples on an online Q&A forum reliable?: a study of API misuse on stack overflow. In Proceedings of 40th IEEE/ACM International Conference on Software Engineering. 886–896.
Citations (6)

Summary

  • The paper introduces InferROI, leveraging LLMs to infer resource-oriented intentions for detecting static resource leaks with improved accuracy.
  • It employs a two-stage detection process that first identifies leak-risk paths and then prunes false positives through reachability validations.
  • Empirical evaluations demonstrate detection rates of 59.3% and 64.8%, with 74.6% precision and 81.8% recall, uncovering 26 new leaks in open-source projects.

Inferring Resource-Oriented Intentions using LLMs for Static Resource Leak Detection

The paper under review presents InferROI, a novel approach leveraging LLMs for static resource leak detection by inferring resource-oriented intentions directly from code. It introduces a two-stage detection process that significantly improves bug detection rates and reduces false alarms.

Overview

Resource leaks, resulting from acquired resources not being appropriately released, have long been recognized as critical software defects leading to performance degradation and system failures. Traditional static analysis methods for resource leak detection are often hindered by dependency on predefined API pairs and mechanical matching techniques, which lead to false negatives and positives due to incomplete identification of acquisition/release APIs and reachability validation conditions.

InferROI brings a fresh perspective by employing the advanced code comprehension capabilities of LLMs to infer three distinct resource-oriented intentions in code: resource acquisition, release, and reachability validation. This inference does not depend on prior knowledge of resource-specific APIs, making it more adaptable and expansive in detecting diverse resource types.

Methodology

The approach outlined in the paper consists of several key components:

  1. Resource-Oriented Intention Inference: The paper employs prompting strategies tailored for GPT-4, instructing it to discern resource-related intentions from given code snippets. The LLM analyzes the syntax and semantics of the code to infer potential acquisition, release, and validation intentions. The extracted intentions are then formalized into expressions suitable for subsequent analysis.
  2. Lightweight Static Analysis: Once intentions are inferred, InferROI applies a two-stage path analysis to detect resource leaks effectively. The first stage identifies potential leak-risky paths based on the inferred acquisition and release intentions. The second stage prunes these paths by assessing resource reachability validation, thus reducing false positives.
  3. Application and Evaluation: In evaluations using the DroidLeaks and JLeaks datasets, InferROI demonstrated high bug detection rates (59.3% and 64.8%, respectively) with reasonable false alarm rates. The comparisons with established static analysis tools like SpotBugs, Infer, and PMD highlighted clear improvements. Additionally, InferROI successfully identified 26 new resource leaks in real-world open-source projects, underscoring its practical utility.

Findings and Implications

InferROI showcases a significant stride forward in resource leak detection, elucidating the potential of LLMs in static analysis domains. The empirical results, complemented by a precision of 74.6% and a recall of 81.8% in intention inference, highlight its efficacy in identifying diverse resource types across different codebases.

  • Broader Coverage of Resource Types: By decoupling from the constraints of predefined API pairs, InferROI achieves broader resource type coverage, outperforming traditional static analyzers which often miss less common or newly introduced resource types.
  • Scalability and Flexibility: The integration of LLMs encompasses a wide range of potential use cases, allowing for scalable applications in various programming environments without extensive manual configuration or predefined knowledge.
  • Complementary to Existing Techniques: While showcasing independent effectiveness, InferROI's approach can complement more rigorous program analysis techniques, opening avenues for hybrid detection strategies that leverage both LLM-based inference and sound static analysis.

Future Directions

The findings from this paper open up several research opportunities. Future work could extend this approach to additional programming languages and integrate more sophisticated program analysis methodologies to address scenarios such as Android’s complex lifecycle management. Additionally, advancements in LLM technology and fine-tuning could further empower the inference capabilities, bridging the gap between syntactic comprehension and deeper semantic understanding.

Conclusion

InferROI embodies a promising advancement in static resource leak detection, offering a robust framework that effectively incorporates LLM-based resource-oriented intention inference. This work emphasizes the important role of AI in enhancing static analysis tools, providing a pathway to more intelligent and adaptive defect detection methodologies.

X Twitter Logo Streamline Icon: https://streamlinehq.com