SPINN: Synergistic Progressive Inference of Neural Networks over Device and Cloud (2008.06402v2)

Published 14 Aug 2020 in cs.LG, cs.CV, cs.DC, and stat.ML

Abstract: Despite the soaring use of convolutional neural networks (CNNs) in mobile applications, uniformly sustaining high-performance inference on mobile has been elusive due to the excessive computational demands of modern CNNs and the increasing diversity of deployed devices. A popular alternative comprises offloading CNN processing to powerful cloud-based servers. Nevertheless, by relying on the cloud to produce outputs, emerging mission-critical and high-mobility applications, such as drone obstacle avoidance or interactive applications, can suffer from the dynamic connectivity conditions and the uncertain availability of the cloud. In this paper, we propose SPINN, a distributed inference system that employs synergistic device-cloud computation together with a progressive inference method to deliver fast and robust CNN inference across diverse settings. The proposed system introduces a novel scheduler that co-optimises the early-exit policy and the CNN splitting at run time, in order to adapt to dynamic conditions and meet user-defined service-level requirements. Quantitative evaluation illustrates that SPINN outperforms its state-of-the-art collaborative inference counterparts by up to 2x in achieved throughput under varying network conditions, reduces the server cost by up to 6.8x and improves accuracy by 20.7% under latency constraints, while providing robust operation under uncertain connectivity conditions and significant energy savings compared to cloud-centric execution.

Citations (242)

View on Semantic Scholar

Summary

The paper introduces a progressive inference method that enables early exits in CNNs to dynamically balance computational accuracy and latency.
The system employs condition-aware scheduling to optimally partition tasks between device and cloud, adapting to real-time network and resource variations.
SPINN integrates a CNN-specific communication optimizer that compresses intermediate data, effectively reducing transmission overhead and enhancing throughput.

Synergistic Progressive Inference of Neural Networks (SPINN) for Device-Cloud Cooperation

The paper "SPINN: Synergistic Progressive Inference of Neural Networks over Device and Cloud" presents a novel framework designed to effectively repartition convolutional neural network (CNN) workloads between mobile devices and cloud servers. The proposed system, SPINN, aims to address the challenges faced by state-of-the-art CNN inference solutions in efficiently utilizing both mobile and cloud resources, while also providing robustness against variable network conditions and device capabilities.

Key Contributions

SPINN introduces several innovative components that collectively contribute to an optimized CNN deployment:

Progressive Inference Networks: SPINN utilizes a progressive inference methodology, incorporating multiple early exits throughout the network to allow for adaptive inference based on input complexity and desired confidence levels. This approach facilitates the dynamic balancing of accuracy and latency by enabling earlier computations to terminate if sufficient predictive confidence is obtained.
Collaboration Between Device and Cloud: The framework introduces a scheduler that determines optimal partitioning and exit strategies at runtime. This feature allows for the fluid reallocation of computational tasks depending on current network conditions, device resources, and user-defined service-level agreements (SLAs).
CNN-Specific Communication Optimizer: SPINN implements a data-compression module that efficiently reduces the size of intermediate CNN activation data through quantization and compressive techniques, effectively minimizing network transmission overhead.
Condition-Aware Scheduling: The scheduler evaluates latency, throughput, server and device costs, and accuracy metrics in a multi-objective optimization framework. Such an approach ensures that SPINN not only meets but dynamically adapts to various application-specific performance demands.

Performance and Evaluation

Through extensive experimental evaluations over multiple CNN models and real-life scenarios, SPINN demonstrated substantial improvements in throughput and reliability compared to device-only, cloud-only, and traditional device-cloud collaborative approaches:

Throughput and Latency: SPINN achieved higher throughput than state-of-the-art counterparts across diverse network conditions and device capabilities. This was primarily due to the effective early-exit mechanism, which reduced unnecessary computations and network delays through intelligent exit strategy and model partitioning.
Robustness Under Network Variability: SPINN's inherent design to sustain performance even under network fluctuations proved superior in comparison to existing methods, allowing local processing fallback and maintaining satisfactory performance across a spectrum of connectivity scenarios from 4G to 5G networks.
Server Load Management: By considering server load in scheduling decisions, SPINN dynamically adjusted operations to mitigate the influence of server-side computational constraints, showcasing lower server time requirements while maintaining comparable accuracy.

Implications and Future Prospects

The implications of SPINN's design are vast, particularly in AI applications requiring adaptive low-latency processing such as real-time drone navigation, augmented reality, and other mission-critical mobile applications. By offloading tasks in a more informed manner and accommodating current execution conditions, SPINN conserves energy and maximizes resource utilization, critical for extending the capabilities of smaller devices and enhancing user experience.

Looking forward, SPINN can be extended to further explore energy consumption optimizations, support more complex model architectures, incorporate additional environmental adaptations, and handle multi-client scenarios with disparate resource access. As AI deployment scales and cloud-edge-device synergies become increasingly pivotal, SPINN's methodology provides a scalable and adaptable blueprint for future distributed inference frameworks.

PDF Markdown