Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 47 tok/s
Gemini 2.5 Pro 44 tok/s Pro
GPT-5 Medium 13 tok/s Pro
GPT-5 High 12 tok/s Pro
GPT-4o 64 tok/s Pro
Kimi K2 160 tok/s Pro
GPT OSS 120B 452 tok/s Pro
Claude Sonnet 4 37 tok/s Pro
2000 character limit reached

ServeFlow: A Fast-Slow Model Architecture for Network Traffic Analysis (2402.03694v2)

Published 6 Feb 2024 in cs.NI and cs.AI

Abstract: Network traffic analysis increasingly uses complex machine learning models as the internet consolidates and traffic gets more encrypted. However, over high-bandwidth networks, flows can easily arrive faster than model inference rates. The temporal nature of network flows limits simple scale-out approaches leveraged in other high-traffic machine learning applications. Accordingly, this paper presents ServeFlow, a solution for machine-learning model serving aimed at network traffic analysis tasks, which carefully selects the number of packets to collect and the models to apply for individual flows to achieve a balance between minimal latency, high service rate, and high accuracy. We identify that on the same task, inference time across models can differ by 1.8x - 141.3x, while the inter-packet waiting time is up to 6-8 orders of magnitude higher than the inference time! Based on these insights, we tailor a novel fast-slow model architecture for networking ML pipelines. Flows are assigned to a slower model only when the inferences from the fast model are deemed high uncertainty. ServeFlow is able to make inferences on 76.3% of flows in under 16ms, which is a speed-up of 40.5x on the median end-to-end serving latency while increasing the service rate and maintaining similar accuracy. Even with thousands of features per flow, it achieves a service rate of over 48.5k new flows per second on a 16-core CPU commodity server, which matches the order of magnitude of flow rates observed on city-level network backbones.

Citations (1)

Summary

We haven't generated a summary for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Lightbulb On Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com