JointDNN: An Efficient Training and Inference Engine for Intelligent Mobile Cloud Computing Services (1801.08618v2)

Published 25 Jan 2018 in cs.DC, cs.AI, cs.LG, and cs.PF

Abstract: Deep learning models are being deployed in many mobile intelligent applications. End-side services, such as intelligent personal assistants, autonomous cars, and smart home services often employ either simple local models on the mobile or complex remote models on the cloud. However, recent studies have shown that partitioning the DNN computations between the mobile and cloud can increase the latency and energy efficiencies. In this paper, we propose an efficient, adaptive, and practical engine, JointDNN, for collaborative computation between a mobile device and cloud for DNNs in both inference and training phase. JointDNN not only provides an energy and performance efficient method of querying DNNs for the mobile side but also benefits the cloud server by reducing the amount of its workload and communications compared to the cloud-only approach. Given the DNN architecture, we investigate the efficiency of processing some layers on the mobile device and some layers on the cloud server. We provide optimization formulations at layer granularity for forward- and backward-propagations in DNNs, which can adapt to mobile battery limitations and cloud server load constraints and quality of service. JointDNN achieves up to 18 and 32 times reductions on the latency and mobile energy consumption of querying DNNs compared to the status-quo approaches, respectively.

Citations (235)

View on Semantic Scholar

Summary

The paper introduces JointDNN, a novel framework that partitions DNN tasks between mobile devices and cloud servers.
It employs a DAG-based optimization with lossless compression to achieve up to 18x lower latency and 32x energy savings.
The framework reduces server load by up to 84%, offering scalable, cost-effective solutions for real-time AI applications.

Analysis of JointDNN: A Mobile-Cloud Collaborative Framework for DNN Computations

The paper "JointDNN: An Efficient Training and Inference Engine for Intelligent Mobile Cloud Computing Services" proposes a novel framework that partitions Deep Neural Network (DNN) computations between mobile devices and cloud servers, focusing on enhancing both inference and training phases of DNNs. This approach aims to optimize the processing efficiency in terms of both latency and energy consumption, presenting a significant advancement over the traditional mobile-only or cloud-only models.

Efficient Mobile-Cloud Collaboration

JointDNN introduces a method for partitioning DNN computations across mobile and cloud systems, allowing different layers of a network to be processed on either platform. This adaptive strategy considers the inherent hardware constraints of mobile devices and balances them against the computational prowess of cloud servers. By modeling the DNN structure as a Directed Acyclic Graph (DAG), JointDNN utilizes optimization techniques to distribute workload across the devices dynamically. This process involves profiling energy and latency consumption of DNN layers, then leveraging techniques akin to the shortest path problem to identify optimal computation schedules.

Numerical Insights and Implications

The experimental results presented in this paper show that JointDNN achieves substantial improvements in both latency and energy efficiency compared to the state-of-the-art approaches. For instance, the framework reports improvements up to 18 times in latency and 32 times in energy consumption for some DNN querying tasks. These results are achieved through a collaborative execution model where computationally lightweight tasks are offloaded to mobile devices, significantly reducing server workload and achieving lower communication overhead.

The research identifies key areas where communication cost dominates the overall computational expense, particularly in scenarios employing cloud-only approaches. The paper explores lossless compression techniques, specifically targeting the intermediate outputs of convolutional layers in DNNs, reducing data transfer sizes substantially. Such techniques further improve communication efficiency and hint at future directions for optimizing model architectures tailored to collaborative execution environments.

Theoretical and Practical Contributions

From a theoretical standpoint, JointDNN emphasizes the potential of hybrid computational models in AI, highlighting the role of mobile-cloud synergy in optimizing deep learning models. The comprehensive mathematical modeling discussed, including integer linear programming (ILP) formulations for different constraints (such as energy, performance targets, and quality of service), provides a robust framework for real-time adaptation in dynamic network environments.

Practically, JointDNN showcases the ability to reduce server load by up to 84%, suggesting considerable economic benefits and scalability potential for cloud service providers. Such efficiencies could reshape how intelligent services are delivered, enabling more robust real-time processing capabilities for applications like autonomous vehicles and mobile robotics.

Future Developments in AI

The implications of this work extend to the design of future AI applications where resource constraints and real-time processing are pivotal. Given the rapid advancements in network technologies, such as 5G, the relevance of optimized mobile-cloud DNN deployments will likely continue to grow. Moreover, as more sophisticated DNN models emerge, exploring granularity beyond layer-level execution, incorporating data dependencies, or even considering more complex topological transformations could significantly enhance the applicability of frameworks like JointDNN.

In conclusion, this paper exemplifies a compelling direction in AI research that navigates the practical challenges of deploying high-complexity models in real-world environments, bridging the gap between mobile and cloud computing resources. The insights offered align well with growing trends in energy-efficient computing and distributed AI, providing a foundation for future exploration and innovation in collaborative intelligent systems.

PDF Markdown