Papers
Topics
Authors
Recent
2000 character limit reached

Spot-on: A Checkpointing Framework for Fault-Tolerant Long-running Workloads on Cloud Spot Instances (2210.02589v1)

Published 5 Oct 2022 in cs.DC and q-bio.GN

Abstract: Spot instances offer a cost-effective solution for applications running in the cloud computing environment. However, it is challenging to run long-running jobs on spot instances because they are subject to unpredictable evictions. Here, we present Spot-on, a generic software framework that supports fault-tolerant long-running workloads on spot instances through checkpoint and restart. Spot-on leverages existing checkpointing packages and is compatible with the major cloud vendors. Using a genomics application as a test case, we demonstrated that Spot-on supports both application-specific and transparent checkpointing methods. Compared to running applications using on-demand instances, it allows the completion of these workloads for a significant reduction in computing costs. Compared to running applications using application-specific checkpoint mechanisms, transparent checkpoint-protected applications reduce runtime by up to 40%, leading to further cost savings of up to 86%.

Citations (1)

Summary

We haven't generated a summary for this paper yet.

Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.