Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 175 tok/s
Gemini 2.5 Pro 54 tok/s Pro
GPT-5 Medium 27 tok/s Pro
GPT-5 High 23 tok/s Pro
GPT-4o 96 tok/s Pro
Kimi K2 196 tok/s Pro
GPT OSS 120B 464 tok/s Pro
Claude Sonnet 4.5 35 tok/s Pro
2000 character limit reached

Anomaly Analysis for Co-located Datacenter Workloads in the Alibaba Cluster (1811.06901v1)

Published 14 Nov 2018 in cs.DC

Abstract: In warehouse-scale cloud datacenters, co-locating online services and offline batch jobs is an efficient approach to improving datacenter utilization. To better facilitate the understanding of interactions among the co-located workloads and their real-world operational demands, Alibaba recently released a cluster usage and co-located workload dataset, which is the first publicly dataset with precise information about the category of each job. In this paper, we perform a deep analysis on the released Alibaba workload dataset, from the perspective of anomaly analysis and diagnosis. Through data preprocessing, node similarity analysis based on Dynamic Time Warping (DTW), co-located workloads characteristics analysis and anomaly analysis based on iForest, we reveals several insights including: (1) The performance discrepancy of machines in Alibaba's production cluster is relatively large, for the distribution and resource utilization of co-located workloads is not balanced. For instance, the resource utilization (especially memory utilization) of batch jobs is fluctuating and not as stable as that of online containers, and the reason is that online containers are long-running jobs with more memory-demanding and most batch jobs are short jobs, (2) Based on the distribution of co-located workload instance numbers, the machines can be classified into 8 workload distribution categories1. And most patterns of machine resource utilization curves are similar in the same workload distribution category. (3) In addition to the system failures, unreasonable scheduling and workload imbalance are the main causes of anomalies in Alibaba's cluster.

Citations (7)

Summary

We haven't generated a summary for this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.