Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 134 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 29 tok/s Pro
GPT-5 High 39 tok/s Pro
GPT-4o 112 tok/s Pro
Kimi K2 188 tok/s Pro
GPT OSS 120B 442 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

CASH: A Credit Aware Scheduling for Public Cloud Platforms (2009.04561v2)

Published 9 Sep 2020 in cs.DC

Abstract: The public cloud offers a myriad of services which allows its tenants to process large scale big data in a flexible, easy and cost effective manner. Tenants generally use large scale data processing frameworks such as MapReduce, Tez, Spark etc. to process their data. Tenants can configure their frameworks to run individual tasks by the framework itself or have a middleware cluster manager like YARN or Mesos to arbitrate resource scheduling in their public-cloud cluster. Cluster managers need to be cognizant about the workload requirement along with the state of the individual resource such as CPU and disk in the cluster. Cloud providers use a token bucket mechanism for their individual hardware resources as an indicator of the quality-of-service that individual hardware resource can provide. In this paper, through our changes in YARN, Hadoop and Tez, we show how middleware cluster managers can be made cognizant about the expected quality-of-service of individual hardware resources in the cluster. Our optimized cluster manager with a coarse grained knowledge of task requirement and fine grained knowledge of expected quality-of-service of hardware resources in the cluster performs highly optimal task placements. Our experiments with our optimizations show CPU credit based instances like the Amazon T3 instances as a viable cost effective option for running bigdata workloads. We also show that streaming SQL queries on a Hive warehouse can be accelerated by up to 31% leading to public cloud cost savings of up to 22%.

Citations (3)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.