Fuzzing Deep-Learning Libraries via Automated Relational API Inference (2207.05531v1)

Published 12 Jul 2022 in cs.SE

Abstract: A growing body of research has been dedicated to DL model testing. However, there is still limited work on testing DL libraries, which serve as the foundations for building, training, and running DL models. Prior work on fuzzing DL libraries can only generate tests for APIs which have been invoked by documentation examples, developer tests, or DL models, leaving a large number of APIs untested. In this paper, we propose DeepREL, the first approach to automatically inferring relational APIs for more effective DL library fuzzing. Our basic hypothesis is that for a DL library under test, there may exist a number of APIs sharing similar input parameters and outputs; in this way, we can easily "borrow" test inputs from invoked APIs to test other relational APIs. Furthermore, we formalize the notion of value equivalence and status equivalence for relational APIs to serve as the oracle for effective bug finding. We have implemented DeepREL as a fully automated end-to-end relational API inference and fuzzing technique for DL libraries, which 1) automatically infers potential API relations based on API syntactic or semantic information, 2) synthesizes concrete test programs for invoking relational APIs, 3) validates the inferred relational APIs via representative test inputs, and finally 4) performs fuzzing on the verified relational APIs to find potential inconsistencies. Our evaluation on two of the most popular DL libraries, PyTorch and TensorFlow, demonstrates that DeepREL can cover 157% more APIs than state-of-the-art FreeFuzz. To date, DeepREL has detected 162 bugs in total, with 106 already confirmed by the developers as previously unknown bugs. Surprisingly, DeepREL has detected 13.5% of the high-priority bugs for the entire PyTorch issue-tracking system in a three-month period. Also, besides the 162 code bugs, we have also detected 14 documentation bugs (all confirmed).

Citations (52)

View on Semantic Scholar

Summary

The paper introduces DeepREL, a novel method that automatically infers relational API pairs to enhance fuzzing in deep-learning libraries.
It employs mutation-based testing using inferred API relationships, uncovering 162 bugs with 106 previously unknown issues.
Evaluation on TensorFlow and PyTorch shows a 13.5% detection rate for high-priority PyTorch bugs, demonstrating its practical impact.

Fuzzing Deep-Learning Libraries via Automated Relational API Inference

Introduction

Deep learning (DL) libraries, such as TensorFlow and PyTorch, have become instrumental in developing DL models. As the complexity and application areas of these libraries expand, ensuring their reliability and bug-free operation is paramount. Traditional DL library testing approaches, focusing mainly on DL models, leave a notable gap in directly testing the libraries' APIs. The limitations in API coverage and test oracles necessitate new directions in testing methodologies. This paper introduces DeepREL, a novel approach for fuzzing DL libraries through automated inference of relational APIs, marking a significant advancement in the domain of DL library testing.

Background and Motivation

The prevailing methods of fuzzing DL libraries primarily target model-level testing or require extensive manual efforts for API-level testing. These approaches suffer from insufficient API coverage and ineffective test oracles. For instance, model-level testing techniques like CRADLE and its successors (AUDEE and LEMON) only cover a fraction of available DL library APIs. API-level testing, as exemplified by FreeFuzz, makes substantial improvements in API coverage but still leaves many APIs untested due to limitations in mining test inputs. The introduction of DeepREL seeks to address these challenges by leveraging relational API inference for enhanced fuzzing effectiveness.

DeepREL: Key Contributions

DeepREL proposes a fully automated end-to-end framework centered on the hypothesis that relational APIs (i.e., APIs sharing similar input parameters and outputs) exist within DL libraries. These relational APIs can serve both as a source for "borrowing" test inputs and as test oracles for differential testing. DeepREL's contributions can be delineated as follows:

Automated Relational API Inference: DeepREL automatically infers potential relational API pairs based on syntactic and semantic information, elevating the scope of fuzzing beyond what is directly covered by existing test inputs.
Dynamic API Relation Verification: Through synthesizing concrete test programs, DeepREL dynamically validates inferred API relations using representative test inputs, thereby ensuring the accuracy of the inferred relations.
Effective Fuzzing Using Relational APIs: Leveraging the verified relational APIs, DeepREL employs mutation-based fuzzing to uncover inconsistencies and potential bugs.

Evaluation and Results

DeepREL was evaluated on two of the most widely used DL libraries, TensorFlow and PyTorch, demonstrating its capability to substantially increase API coverage. Specifically, DeepREL covered 157 more APIs than the state-of-the-art FreeFuzz and identified 162 bugs, with 106 confirmed as previously unknown errors by the developers. Remarkably, DeepREL detected 13.5\% of the high-priority bugs reported within a three-month period for the PyTorch issue-tracking system.

Implications and Future Directions

The success of DeepREL underscores the feasibility and benefits of applying automated relational API inference in fuzzing DL libraries. This approach not only improves API coverage significantly but also introduces new, powerful test oracles in the form of relational APIs. The results suggest a promising direction for advancing the reliability of DL libraries, which form the backbone of contemporary AI systems. Future work may explore the applicability of DeepREL's methodology to other software systems, enhancement of the framework to reduce false positive rates, and further optimization of the relational API inference mechanism for broader application.

In conclusion, DeepREL opens new avenues for DL library testing, enabling more comprehensive and effective fuzzing through the innovative use of relational API inference. This approach represents a meaningful stride towards ensuring the robustness and reliability of the indispensable DL libraries that empower the AI landscape today.

PDF Markdown