Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 37 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 10 tok/s Pro
GPT-5 High 15 tok/s Pro
GPT-4o 84 tok/s Pro
Kimi K2 198 tok/s Pro
GPT OSS 120B 448 tok/s Pro
Claude Sonnet 4 31 tok/s Pro
2000 character limit reached

Exploring Non-Homogeneity and Dynamicity of High Scale Cloud through Hive and Pig (1503.06600v1)

Published 23 Mar 2015 in cs.DC

Abstract: Cloud computing deals with heterogeneity and dynamicity at all levels and therefore there is a need to manage resources in such an environment and properly allocate them. Resource planning and scheduling requires a proper understanding of arrival patterns and scheduling of resources. Study of workloads can aid in proper understanding of their associated environment. Google has released its latest version of cluster trace, trace version 2.1 in November 2014.The trace consists of cell information of about 29 days spanning across 700k jobs. This paper deals with statistical analysis of this cluster trace. Since the size of trace is very large, Hive which is a Hadoop distributed file system (HDFS) based platform for querying and analysis of Big data, has been used. Hive was accessed through its Beeswax interface. The data was imported into HDFS through HCatalog. Apart from Hive, Pig which is a scripting language and provides abstraction on top of Hadoop was used. To the best of our knowledge the analytical method adopted by us is novel and has helped in gaining several useful insights. Clustering of jobs and arrival time has been done in this paper using K-means++ clustering followed by analysis of distribution of arrival time of jobs which revealed weibull distribution while resource usage was close to zip-f like distribution and process runtimes revealed heavy tailed distribution.

Citations (5)

Summary

We haven't generated a summary for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Lightbulb On Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.