Emergent Mind

FSD-Inference: Fully Serverless Distributed Inference with Scalable Cloud Communication

(2403.15195)

Published Mar 22, 2024 in cs.DC , cs.AI , and cs.LG

Abstract

Serverless computing offers attractive scalability, elasticity and cost-effectiveness. However, constraints on memory, CPU and function runtime have hindered its adoption for data-intensive applications and ML workloads. Traditional 'server-ful' platforms enable distributed computation via fast networks and well-established inter-process communication (IPC) mechanisms such as MPI and shared memory. In the absence of such solutions in the serverless domain, parallel computation with significant IPC requirements is challenging. We present FSD-Inference, the first fully serverless and highly scalable system for distributed ML inference. We explore potential communication channels, in conjunction with Function-as-a-Service (FaaS) compute, to design a state-of-the-art solution for distributed ML within the context of serverless data-intensive computing. We introduce novel fully serverless communication schemes for ML inference workloads, leveraging both cloud-based publish-subscribe/queueing and object storage offerings. We demonstrate how publish-subscribe/queueing services can be adapted for FaaS IPC with comparable performance to object storage, while offering significantly reduced cost at high parallelism levels. We conduct in-depth experiments on benchmark DNNs of various sizes. The results show that when compared to server-based alternatives, FSD-Inference is significantly more cost-effective and scalable, and can even achieve competitive performance against optimized HPC solutions. Experiments also confirm that our serverless solution can handle large distributed workloads and leverage high degrees of FaaS parallelism.

High-level architecture of cloud-based inference for feature selection and detection.

Overview

The paper introduces FSD-Inference, a system enabling distributed ML inference in a serverless computing environment, overcoming traditional challenges of memory and CPU constraints.
FSD-Inference utilizes cloud communication services like publish-subscribe/queueing and object storage to support distributed ML models efficiently, demonstrating a scalable solution for serverless ML workloads.
The system exhibits unique features such as intra-layer model parallelism and a hierarchical function launch mechanism, addressing serverless instances' limitations and improving operational efficiency.
FSD-Inference's approach significantly advances the understanding and application of serverless computing for data-intensive and ML tasks, offering a cost-effective and scalable alternative for developers and businesses.

Fully Serverless Distributed Machine Learning Inference with Scalable Cloud Communication

Introduction

The realm of serverless computing has significantly changed the landscape of cloud computing with its scalability, elasticity, and cost-effectiveness. Yet, its adoption for data-intensive applications, including ML workloads, has been hindered by limitations such as memory, CPU constraints, and the lack of established inter-process communication (IPC) mechanisms. Addressing this gap, the paper introduces FSD-Inference: a fully serverless system designed for distributed ML inference, leveraging the cloud's scalable communication capabilities.

The Challenge of Serverless Distributed ML

Achieving distributed ML inference in a serverless environment presents unique challenges. Traditional platforms facilitate distributed computation via fast networks and IPC mechanisms like MPI and shared memory. However, serverless computing lacks these direct communication solutions, making parallel computation with significant IPC needs challenging. Further, serverless platforms impose limitations such as function runtime and memory capacity, adding constraints to deploying data-intensive workloads.

FSD-Inference: A Novel Approach

FSD-Inference is positioned as the first scalable solution for distributed ML inference within the serverless paradigm. It introduces fully serverless communication schemes leveraging cloud-based publish-subscribe/queueing and object storage services. This innovation enables distributed ML models to be efficiently processed over serverless infrastructures. By adapting publish-subscribe/queueing services for IPC, FSD-Inference provides a performance comparable to object storage solutions but with significantly reduced operational costs at high levels of parallelism.

The system demonstrates a novel mechanism of intra-layer model parallelism, addressing memory limitations of serverless instances and supporting high degrees of parallelism. Additionally, FSD-Inference introduces a hierarchical function launch mechanism, which minimizes startup delays and facilitates an intelligent distribution of computational tasks. The proposed solution not only offers a rigorous cost model but also practical design recommendations for various ML and data-intensive applications.

Theoretical and Practical Implications

FSD-Inference significantly advances the theoretical understanding of serverless computing's potential for data-intensive and ML workloads. It proves that with smart communication mechanisms and model partitioning strategies, the serverless model can efficiently handle distributed ML inference tasks that were previously thought to be beyond its capability.

Practically, this research opens new doors for developers and businesses to leverage serverless computing for complex ML workloads without the traditional barriers associated with server-based systems. It presents a viable alternative that is both cost-effective and scalable, suitable for dynamic or sporadic workloads where traditional server-based solutions may not be practical or cost-efficient.

Future Directions in AI and Cloud Computing

The implications of this research are vast for the future of AI and cloud computing. As serverless computing continues to evolve, further optimization and new functionalities might enhance the capability of systems like FSD-Inference even more. Future developments could focus on integrating more advanced AI models, optimizing communication schemes for even lower costs, and expanding the serverless paradigm to more complex, data-intensive computational tasks.

This research sets a foundational step towards fully realizing the potential of serverless computing in supporting sophisticated ML workloads, thereby contributing to the broader goal of making AI more accessible and cost-effective for various applications.

Create an account to read this summary for free:

https://twitter.com/SteveMoraco/status/1772503154661159048

https://twitter.com/HPCPapers/status/1772142449877487936