Edge Intelligence: On-Demand Deep Learning Model Co-Inference with Device-Edge Synergy (1806.07840v4)

Published 20 Jun 2018 in cs.DC, cs.AI, cs.CV, cs.MM, and cs.NI

Abstract: As the backbone technology of machine learning, deep neural networks (DNNs) have have quickly ascended to the spotlight. Running DNNs on resource-constrained mobile devices is, however, by no means trivial, since it incurs high performance and energy overhead. While offloading DNNs to the cloud for execution suffers unpredictable performance, due to the uncontrolled long wide-area network latency. To address these challenges, in this paper, we propose Edgent, a collaborative and on-demand DNN co-inference framework with device-edge synergy. Edgent pursues two design knobs: (1) DNN partitioning that adaptively partitions DNN computation between device and edge, in order to leverage hybrid computation resources in proximity for real-time DNN inference. (2) DNN right-sizing that accelerates DNN inference through early-exit at a proper intermediate DNN layer to further reduce the computation latency. The prototype implementation and extensive evaluations based on Raspberry Pi demonstrate Edgent's effectiveness in enabling on-demand low-latency edge intelligence.

Citations (309)

View on Semantic Scholar

Summary

The paper presents Edgent, a framework that dynamically partitions DNN tasks between devices and edge servers to significantly reduce inference latency.
It introduces a DNN right-sizing mechanism that enables early exits, balancing inference accuracy with computation speed.
Empirical results using a Raspberry Pi prototype validate the approach by demonstrating notable latency reductions compared to device-only and edge-only strategies.

Edge Intelligence: On-Demand Deep Learning Model Co-Inference with Device-Edge Synergy

The research paper titled "Edge Intelligence: On-Demand Deep Learning Model Co-Inference with Device-Edge Synergy" by En Li, Zhi Zhou, and Xu Chen proposes an innovative approach to optimizing deep neural network (DNN) inference through the strategic utilization of both edge computing and mobile devices. This approach addresses significant performance and energy bottlenecks traditionally associated with DNN execution on resource-constrained mobile devices. The proposed framework, termed \textsf{Edgent}, introduces two primary mechanisms—DNN partitioning and DNN right-sizing—to achieve low-latency and efficient edge intelligence.

Key Contributions

DNN Partitioning: The concept of DNN partitioning is leveraged to distribute the computational load between mobile devices and edge servers dynamically. By optimizing the partition point considering current network bandwidth conditions, the framework utilizes available proximate computing resources efficiently, thereby minimizing the latency for real-time DNN inference tasks.
DNN Right-Sizing: This method accelerates DNN inference by enabling an early exit at intermediate DNN layers. It effectively strikes a balance between computational efficiency and inference accuracy, addressing the latency-accuracy tradeoff inherent in early exits. This is particularly beneficial for mission-critical applications with predefined deadlines where moderate accuracy tradeoff is acceptable to meet stringent latency requirements.

Framework Architecture

The proposed \textsf{Edgent} framework consists of three stages: offline training, online optimization, and co-inference. During the offline stage, performance models are developed through profiling, and DNN models are pre-trained with multiple exit points. The online optimization stage involves selecting the optimal exit and partition points based on the current network conditions, predicted layer latencies, and the specified latency requirement. Finally, at the co-inference stage, the DNN computation is executed collaboratively between the edge server and the mobile device according to the determined plan.

Evaluation and Results

The evaluation of \textsf{Edgent} was conducted using a prototype implemented on a Raspberry Pi, simulating a mobile device environment. This prototype was rigorously tested across various bandwidth conditions and latency requirements, demonstrating the effectiveness of the proposed partitioning and right-sizing strategies. Notably, the results indicated significant latency reductions when compared to device-only or edge-only execution strategies, while maintaining a satisfactory level of inference accuracy.

Implications and Future Directions

The proposed \textsf{Edgent} framework constitutes a significant step towards realizing efficient and real-time edge intelligence, particularly relevant as IoT devices and mobile applications increasingly adopt DNN-based models. The framework addresses key challenges associated with bandwidth variability and computational resource limitations, laying a robust foundation for practical deployment in various real-time applications such as augmented reality (AR) and autonomous robotics.

Looking forward, the paper opens several avenues for future research within the domain of AI and edge computing. Potential areas include further refinement of the proposed partitioning and right-sizing strategies to accommodate varying degrees of computational heterogeneity and extending the framework to encompass energy efficiency considerations. Given the fast-paced advancements in edge computing technologies and the continuous evolution of DNN architectures, \textsf{Edgent} provides a compelling blueprint for the development of next-generation intelligent edge solutions.

PDF Markdown