daBNN: A Super Fast Inference Framework for Binary Neural Networks on ARM devices (1908.05858v1)

Published 16 Aug 2019 in cs.CV, cs.MM, and eess.IV

Abstract: It is always well believed that Binary Neural Networks (BNNs) could drastically accelerate the inference efficiency by replacing the arithmetic operations in float-valued Deep Neural Networks (DNNs) with bit-wise operations. Nevertheless, there has not been open-source implementation in support of this idea on low-end ARM devices (e.g., mobile phones and embedded devices). In this work, we propose daBNN --- a super fast inference framework that implements BNNs on ARM devices. Several speed-up and memory refinement strategies for bit-packing, binarized convolution, and memory layout are uniquely devised to enhance inference efficiency. Compared to the recent open-source BNN inference framework, BMXNet, our daBNN is $7\times$$\sim$$23\times$ faster on a single binary convolution, and about $6\times$ faster on Bi-Real Net 18 (a BNN variant of ResNet-18). The daBNN is a BSD-licensed inference framework, and its source code, sample projects and pre-trained models are available on-line: https://github.com/JDAI-CV/dabnn.

Citations (62)

View on Semantic Scholar

Summary

The paper introduces daBNN, a framework that accelerates binary neural network inference on ARM devices by optimizing bit-packing, convolution, and memory layout.
The paper demonstrates up to 23× faster performance over existing methods by leveraging SIMD instructions and a redesigned binary direct convolution approach.
The framework’s open-source BSD license and efficient design promote practical deployment and further exploration of novel BNN architectures on constrained hardware.

daBNN: A Super Fast Inference Framework for Binary Neural Networks on ARM Devices

The paper, titled "daBNN: A Super Fast Inference Framework for Binary Neural Networks on ARM devices," introduces a highly optimized inference framework specifically designed to implement Binary Neural Networks (BNNs) on ARM devices. This research addresses the limitations of executing DNNs on low-end devices, such as mobile phones, due to their constrained memory and computational capabilities. BNNs offer a potential solution by quantizing weights and activations to binary values, thus facilitating efficient inference through bit-wise operations.

Key Contributions

The primary contribution of this research is the development of the daBNN framework, which is characteristically faster than existing alternatives. Notable innovations include:

Bit-Packing Optimization: The authors present an enhanced bit-packing scheme that significantly outperforms traditional methods by employing SIMD instructions to aggregate multiple elements simultaneously, reducing the latency by fourfold compared to naive sequential approaches.
Binary Direct Convolution: This method is proposed to improve the inefficiencies found in traditional binary matrix multiplications (BGEMM) used in BNNs. By modifying the calculation order, daBNN reduces the overhead of additional instructions, specifically achieving better performance on ARMv8 and ARMv7 architectures.
Memory Layout Refinement: Introducing a new NC\textsubscript{1}HWC\textsubscript{2} memory layout leverages spatial redundancies in convolution operations, thereby decreasing memory access by approximately two-thirds compared to conventional layouts.

Performance Evaluation

Empirical evaluations demonstrate that daBNN achieves substantial improvements in inference speed, specifically being 7× to 23× faster than BMXNet on single binary convolution tasks and roughly 6× faster on Bi-Real Net 18. Compared to TensorFlow Lite, daBNN shows an 8× to 10× performance increase in single binary convolution and offers a 3× improvement with Bi-Real Net 18. Such results underline the efficiency of the framework in practical deployment scenarios.

Implications and Future Work

The implications of this research are twofold:

Practical Deployability: daBNN provides an open-source, BSD-licensed solution for deploying binary networks on ARM devices, encouraging broader adoption and experimentation in industry environments where computational efficiency is crucial.
Research Opportunities: The framework's availability facilitates the exploration and design of novel BNN structures, offering insight into more computationally efficient architectures.

Looking forward, the authors express an interest in expanding the architecture support of daBNN to x86 and RISC-V platforms. Furthermore, collaboration opportunities with research teams to innovate and refine BNN structures are anticipated as future directions.

This research contributes a significant advancement in executing neural networks efficiently on constrained hardware, offering both a practical tool for deployment and a foundation for continued innovation in BNN design. The availability of daBNN as an open-source project further cements its potential influence, encouraging developers and researchers alike to leverage and build upon this framework in advancing neural network applications on low-end devices.

PDF Markdown

Related Papers

GitHub

GitHub - JDAI-CV/dabnn: dabnn is an accelerated binary neural networks inference framework for mobile platform (773 stars)