FuzzCoder: Byte-level Fuzzing Test via Large Language Model (2409.01944v1)
Abstract: Fuzzing is an important dynamic program analysis technique designed for finding vulnerabilities in complex software. Fuzzing involves presenting a target program with crafted malicious input to cause crashes, buffer overflows, memory errors, and exceptions. Crafting malicious inputs in an efficient manner is a difficult open problem and the best approaches often apply uniform random mutations to pre-existing valid inputs. In this work, we propose to adopt fine-tuned LLMs (FuzzCoder) to learn patterns in the input files from successful attacks to guide future fuzzing explorations. Specifically, we develop a framework to leverage the code LLMs to guide the mutation process of inputs in fuzzing. The mutation process is formulated as the sequence-to-sequence modeling, where LLM receives a sequence of bytes and then outputs the mutated byte sequence. FuzzCoder is fine-tuned on the created instruction dataset (Fuzz-Instruct), where the successful fuzzing history is collected from the heuristic fuzzing tool. FuzzCoder can predict mutation locations and strategies locations in input files to trigger abnormal behaviors of the program. Experimental results show that FuzzCoder based on AFL (American Fuzzy Lop) gain significant improvements in terms of effective proportion of mutation (EPM) and number of crashes (NC) for various input formats including ELF, JPG, MP3, and XML.
- Gpt-4 technical report. arXiv preprint arXiv:2303.08774.
- Qwen technical report. arXiv preprint arXiv:2309.16609.
- xcot: Cross-lingual instruction tuning for cross-lingual chain-of-thought reasoning. arXiv preprint arXiv:2401.07037.
- Compiler fuzzing through deep learning. In Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis, pages 95–105.
- Large language models are zero-shot fuzzers: Fuzzing deep-learning libraries via large language models. In Proceedings of the 32nd ACM SIGSOFT international symposium on software testing and analysis, pages 423–435.
- Learn&fuzz: Machine learning for input fuzzing. In 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE), pages 50–59. IEEE.
- Deep Learning. MIT Press. http://www.deeplearningbook.org.
- Deepseek-coder: When the large language model meets programming–the rise of code intelligence. arXiv preprint arXiv:2401.14196.
- Owl: A large language model for it operations. arXiv preprint arXiv:2309.09298.
- Lemur: Log parsing with entropy sampling and chain-of-thought merging. arXiv preprint arXiv:2402.18205.
- Dlfuzz: Differential fuzzing testing of deep learning systems. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pages 739–743.
- Learning to fuzz from symbolic execution with application to smart contracts. In Proceedings of the 2019 ACM SIGSAC conference on computer and communications security, pages 531–548.
- Large language models based fuzzing techniques: A survey. arXiv preprint arXiv:2402.00350.
- Fuzzing: a survey. Cybersecurity, 1(1):1–13.
- Ilya Loshchilov and Frank Hutter. 2017. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101.
- The art, science, and engineering of fuzzing: A survey. IEEE Transactions on Software Engineering, 47(11):2312–2331.
- Jibesh Patra and Michael Pradel. 2016. Learning to fuzz: Application-independent fuzz testing with probabilistic, generative models of input data. TU Darmstadt, Department of Computer Science, Tech. Rep. TUD-CS-2016-14664.
- Language models are unsupervised multitask learners.
- Code Llama: Open foundation models for code. arXiv preprint arXiv:2308.12950.
- Kevin Slagle. 2024. Spacebyte: Towards deleting tokenization from large language modeling. arXiv preprint arXiv:2404.14408.
- Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
- Attention is all you need. In NIPS 2017, pages 5998–6008.
- Neural machine translation with byte-level subwords. In The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 7-12, 2020, pages 9154–9160. AAAI Press.
- Free lunch for testing: Fuzzing deep-learning libraries from open source. In Proceedings of the 44th International Conference on Software Engineering, pages 995–1007.
- Beyond language models: Byte models are digital world simulators. arXiv preprint arXiv:2402.19155.
- Fuzz4all: Universal fuzzing with large language models. arXiv preprint arXiv:2308.04748.
- Docter: documentation-guided fuzzing for testing deep learning api functions. In Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis, pages 176–188.
- Alternating language modeling for cross-lingual pre-training. In AAAI 2020, pages 9386–9393.
- High-resource language-specific training for multilingual neural machine translation. In IJCAI 2022, pages 4461–4467.
- UM4: unified multilingual multiple teacher-student model for zero-resource neural machine translation. In IJCAI 2022, pages 4454–4460.
- Seq2seq-afl: Fuzzing via sequence-to-sequence model. International Journal of Machine Learning and Cybernetics, pages 1–19.
- Fuzzllm: A novel and universal fuzzing framework for proactively discovering jailbreak vulnerabilities in large language models. In ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 4485–4489. IEEE.
- A critical review of large language model on software engineering: An example from chatgpt and automated program repair. arXiv preprint arXiv:2310.08879.