Papers
Topics
Authors
Recent
2000 character limit reached

An Efficient Fault Tolerant Workflow Scheduling Approach using Replication Heuristics and Checkpointing in the Cloud (1810.06361v2)

Published 15 Oct 2018 in cs.DC

Abstract: Scientific workflows have been predominantly used for complex and large scale data analysis and scientific computation/automation and the need for robust workflow scheduling techniques has grown considerably. But, most of the existing workflow scheduling algorithms do not provide the required reliability and robustness. In this paper, a new fault tolerant workflow scheduling algorithm that learns replication heuristics in an unsupervised manner has been proposed. Furthermore, the use of light weight synchronized checkpointing enables efficient resubmission of failed tasks and ensures workflow completion even in precarious environments. The proposed technique improves upon metrics like Resource Wastage and Resource Usage in comparison to the Replicate-All algorithm, while maintaining an acceptable increase in Makespan as compared to the vanilla Heterogeneous Earliest Finish Time (HEFT).

Citations (34)

Summary

We haven't generated a summary for this paper yet.

Whiteboard

Paper to Video (Beta)

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube