Optimizing Deep Learning Inference on Embedded Systems Through Adaptive Model Selection (1911.04946v1)

Published 9 Nov 2019 in cs.LG, cs.DC, and cs.PF

Abstract: Deep neural networks ( DNNs ) are becoming a key enabling technology for many application domains. However, on-device inference on battery-powered, resource-constrained embedding systems is often infeasible due to prohibitively long inferencing time and resource requirements of many DNNs. Offloading computation into the cloud is often unacceptable due to privacy concerns, high latency, or the lack of connectivity. While compression algorithms often succeed in reducing inferencing times, they come at the cost of reduced accuracy. This paper presents a new, alternative approach to enable efficient execution of DNNs on embedded devices. Our approach dynamically determines which DNN to use for a given input, by considering the desired accuracy and inference time. It employs machine learning to develop a low-cost predictive model to quickly select a pre-trained DNN to use for a given input and the optimization constraint. We achieve this by first off-line training a predictive model, and then using the learned model to select a DNN model to use for new, unseen inputs. We apply our approach to two representative DNN domains: image classification and machine translation. We evaluate our approach on a Jetson TX2 embedded deep learning platform and consider a range of influential DNN models including convolutional and recurrent neural networks. For image classification, we achieve a 1.8x reduction in inference time with a 7.52% improvement in accuracy, over the most-capable single DNN model. For machine translation, we achieve a 1.34x reduction in inference time over the most-capable single model, with little impact on the quality of translation.

Citations (56)

View on Semantic Scholar

Summary

The paper's main contribution is an adaptive model selection method that dynamically chooses pre-trained DNNs to optimize both accuracy and inference time.
The methodology leverages a premodel that analyzes input features with correlation-based selection to identify the best performing model under constraints.
Experimental evaluation on NVIDIA Jetson TX2 shows up to a 1.8x reduction in inference time for image classification while maintaining robust accuracy.

Optimizing Deep Learning Inference on Embedded Systems Through Adaptive Model Selection

Introduction

The computational demands of deep neural networks (DNNs) pose significant challenges for embedded systems, which are often constrained by limited resources and power. This paper addresses the problem through an adaptive model selection approach that dynamically chooses the most suitable pre-trained DNN for a given input, thus optimizing for accuracy and inference time.

Methodology

The presented approach employs a predictive model—termed the "premodel"—to select the appropriate DNN for each input. This selection process leverages a machine learning model trained offline, which considers various input characteristics and performance constraints. The approach is evaluated across image classification and machine translation domains, showcasing its versatility.

Key Techniques

Premodel Design: The core of the methodology is the premodel, designed to rapidly infer which DNN to use based on input features. It achieves this by examining input features and previously trained models to make dynamic, real-time decisions.
Feature Selection: The paper discusses the importance of feature selection in building a successful predictive model. Using correlation-based selection and iterative evaluation, important features are identified which significantly impact model accuracy.
Model Selection Algorithm: A sequence of model selection strategies is introduced to determine which DNNs are included in the premodel, balancing between accuracy improvements and inference time.

Figure 1: Overview of our approach.

Experimental Evaluation

The approach was tested on a NVIDIA Jetson TX2 platform across two domains: image classification using various CNNs and machine translation using neural translation models. Key findings include:

Inference Time: For image classification, the approach achieved a 1.8x reduction in inference time compared to the best-performing single model while maintaining accuracy. For machine translation, it reached a 1.34x reduction in inference time with negligible impact on translation quality.
Accuracy: The approach enhanced the top-one and top-five accuracy over single DNNs. It demonstrated the ability to deliver comparable or improved accuracy while optimizing resource usage.

Figure 2: Image Classification -- Overall performance of our approach against individual models and an Oracle for inference time (a), energy consumption (b), accuracy (c), and precision, recall, and F1 scores (d).

Implications and Future Work

The adaptive model selection strategy demonstrates a promising direction for optimizing embedded inference without requiring cloud offloading, thus preserving privacy and reducing latency. Future work could explore integrating machine learning-based cost modeling to account for the potential of offloading computations to nearby cloud resources.

Conclusion

The research introduces an effective approach for dynamically selecting among multiple DNNs based on input characteristics and performance constraints. By providing a generalizable methodology spanning different types of neural networks, the paper advances the feasibility of deploying complex deep learning models on resource-constrained embedded devices. Further optimization might include processor-specific adaptations and exploring additional model compression techniques for broader applicability.