- The paper introduces a grammar-aware fuzzing approach that integrates AST-based trimming and mutation to preserve input structure and improve bug detection.
- It achieves a 16.7% increase in line coverage and an 8.8% boost in function coverage, uncovering 31 new bugs including 21 previously undetected vulnerabilities.
- The study highlights the potential for grammar-aware techniques to broaden fuzz testing effectiveness and encourages further research in automatic grammar inference.
Superion: Grammar-Aware Greybox Fuzzing
The paper "Superion: Grammar-Aware Greybox Fuzzing" introduces a novel approach to fuzz testing, particularly aimed at improving the efficacy of fuzzing structured inputs such as XML and JavaScript. Traditional coverage-based greybox fuzzers like American Fuzzy Lop (AFL) have shown substantial success in identifying vulnerabilities within applications that process unstructured test inputs. However, their performance diminishes considerably when faced with structured inputs due to their grammar-blind mutation and trimming strategies.
Proposition
The authors address these limitations by proposing a grammar-aware approach to greybox fuzzing. Superion, their proposed extension to AFL, adopts a series of grammar-aware techniques. These techniques involve parsing test inputs into abstract syntax trees (ASTs) and using these trees to inform both trimming and mutation processes. The intention behind this methodology is to retain the syntactic validity of test inputs while expanding the breadth and depth of fuzzing exploration.
- Grammar-aware Trimming: The proposed trimming strategy relies on ASTs to prune test inputs incrementally while maintaining their grammar. This approach counters the pitfalls of conventional strategies which can inadvertently alter the structure of the input, thus rendering large sections of testing ineffective.
- Grammar-aware Mutation: Superion introduces two mutation strategies:
- Enhanced dictionary-based mutations utilize grammatically significant tokens and strategically apply them to input boundaries identified using the grammar rules.
- Tree-based mutations leverage the AST representation, replacing subtrees within the input to create new, potentially interesting test inputs.
Evaluation
The effectiveness of Superion was evaluated against real-world programs including XML engine libplist and JavaScript engines such as WebKit, Jerryscript, and ChakraCore. The results were promising, demonstrating:
- An improvement in line and function coverage by 16.7% and 8.8%, respectively, compared to AFL.
- A significant enhancement in bug-finding capability, discovering 31 new bugs, including 21 vulnerabilities not caught by AFL.
- Receipt of bug bounty rewards amounting to 3.2K USD, affirming the practical effectiveness of Superion in real-world settings.
Implications and Future Work
The implications of Superion extend beyond just providing a more effective tool for fuzzing structured inputs. It highlights the necessity of incorporating explicit grammatical awareness into fuzzing strategies to enhance their performance on grammar-bound formats. The structured format awareness could potentially be generalized to other domains where standards and protocols define the input structure, suggesting broader applicability.
Looking forward, future work could delve into integrating automatic grammar inference methods to enable applications where formal grammars are not readily available or are proprietary. Other possible avenues include the optimization of grammar-aware parsing and mutation operations to further reduce performance overhead and the exploration of adaptive mutation techniques that could decrease the time spent on less effective fuzzing efforts.
In summary, the introduction of Superion signifies a meaningful stride in fuzzer development, elegantly marrying traditional coverage-based methods with grammar awareness to handle structured inputs more adeptly. This research lays a foundation for future endeavours in grammar-aware fuzzing, potentially setting a new standard in testing methodologies for structured input applications.