DFlow: Efficient Dataflow-based Invocation Workflow Execution for Function-as-a-Service (2306.11043v2)
Abstract: The Serverless Computing is becoming increasingly popular due to its ease of use and fine-grained billing. These features make it appealing for stateful application or serverless workflow. However, current serverless workflow systems utilize a controlflow-based invocation pattern to invoke functions. In this execution pattern, the function invocation depends on the state of the function. A function can only begin executing once all its precursor functions have completed. As a result, this pattern may potentially lead to longer end-to-end execution time. We design and implement the DFlow, a novel dataflow-based serverless workflow system that achieves high performance for serverless workflow. DFlow introduces a distributed scheduler (DScheduler) by using the dataflow-based invocation pattern to invoke functions. In this pattern, the function invocation depends on the data dependency between functions. The function can start to execute even its precursor functions are still running. DFlow further features a distributed store (DStore) that utilizes effective fine-grained optimization techniques to eliminate function interaction, thereby enabling efficient data exchange. With the support of DScheduler and DStore, DFlow can achieving an average improvement of 60% over CFlow, 40% over FaaSFlow, 25% over FaasFlowRedis, and 40% over KNIX on 99%-ile latency respectively. Further, it can improve network bandwidth utilization by 2x-4x over CFlow and 1.5x-3x over FaaSFlow, FaaSFlowRedis and KNIX, respectively. DFlow effectively reduces the cold startup latency, achieving an average improvement of 5.6x over CFlow and 1.1x over FaaSFlow
- Le taureau: Deconstructing the serverless landscape & A look forward. In David Maier, Rachel Pottinger, AnHai Doan, Wang-Chiew Tan, Abdussalam Alawini, and Hung Q. Ngo, editors, Proceedings of the 2020 International Conference on Management of Data, SIGMOD Conference 2020, online conference [Portland, OR, USA], June 14-19, 2020, pages 2641–2650. ACM, 2020.
- Faasflow: enable efficient workflow execution for function-as-a-service. In ASPLOS, pages 782–796. ACM, 2022.
- Cloudburst: Stateful functions-as-a-service. volume 13, pages 2438–2452, 2020.
- Faastlane: Accelerating function-as-a-service workflows. In USENIX Annual Technical Conference, pages 805–820. USENIX Association, 2021.
- SAND: towards high-performance serverless computing. In USENIX Annual Technical Conference, pages 923–935. USENIX Association, 2018.
- Jiffy: elastic far-memory for stateful serverless analytics. In EuroSys, pages 697–713. ACM, 2022.
- Lambada: Interactive data analytics on cold data using serverless cloud infrastructure. In David Maier, Rachel Pottinger, AnHai Doan, Wang-Chiew Tan, Abdussalam Alawini, and Hung Q. Ngo, editors, Proceedings of the 2020 International Conference on Management of Data, SIGMOD Conference 2020, online conference [Portland, OR, USA], June 14-19, 2020, pages 115–130. ACM, 2020.
- Towards demystifying serverless machine learning training. In Guoliang Li, Zhanhuai Li, Stratos Idreos, and Divesh Srivastava, editors, SIGMOD ’21: International Conference on Management of Data, Virtual Event, China, June 20-25, 2021, pages 857–871. ACM, 2021.
- https://aws.amazon.com/step-functions/., 2022.
- https://cloud.google.com/functions/., 2021.
- https://www.alibabacloud.com/product/serverless-workflow., 2022.
- https://azure.microsoft.com/en-us/services/functions/., 2021.
- https://github.com/fission/fission-workflows, 2022.
- https://github.com/fnproject/fn.git, 2022.
- https://github.com/apache/openwhisk., 2021.
- Sequoia: enabling quality-of-service in serverless computing. In Rodrigo Fonseca, Christina Delimitrou, and Beng Chin Ooi, editors, SoCC ’20: ACM Symposium on Cloud Computing, Virtual Event, USA, October 19-21, 2020, pages 311–327. ACM, 2020.
- Boki: Stateful serverless computing with shared logs. In SOSP, pages 691–707. ACM, 2021.
- Nightcore: efficient and scalable serverless computing for latency-sensitive, interactive microservices. In ASPLOS, pages 152–166. ACM, 2021.
- SONIC: application-aware data passing for chained serverless applications. In Irina Calciu and Geoff Kuenning, editors, 2021 USENIX Annual Technical Conference, USENIX ATC 2021, July 14-16, 2021, pages 285–301. USENIX Association, 2021.
- Faastlane: Accelerating function-as-a-service workflows. In Irina Calciu and Geoff Kuenning, editors, 2021 USENIX Annual Technical Conference, USENIX ATC 2021, July 14-16, 2021, pages 805–820. USENIX Association, 2021.
- Serverless computing: One step forward, two steps back. In 9th Biennial Conference on Innovative Data Systems Research, CIDR 2019, Asilomar, CA, USA, January 13-16, 2019, Online Proceedings. www.cidrdb.org, 2019.
- https://aws.amazon.com/lambda/, 2022.
- The serverless computing survey: A technical primer for design architecture, 2021.
- Hoplite: efficient and fault-tolerant collective communication for task-based distributed systems. In Fernando A. Kuipers and Matthew C. Caesar, editors, ACM SIGCOMM 2021 Conference, Virtual Event, USA, August 23-27, 2021, pages 641–656. ACM, 2021.
- wfrest. https://github.com/wfrest/wfrest.git, 2022.
- gRPC. https://grpc.io/, 2020.
- Using simple pid-inspired controllers for online resilient resource management of distributed scientific workflows. volume 95, pages 615–628, 2019.
- Pegasus, a workflow management system for science automation. volume 46, pages 17–35, 2015.
- https://couchdb.apache.org/., 2022.
- redis. https://redis.io/, 2022.
- wondershaper. https://github.com/magnific0/wondershaper.git, 2021.
- ORION and the three rights: Sizing, bundling, and prewarming for serverless DAGs. In 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI 22), pages 303–320, Carlsbad, CA, July 2022. USENIX Association.
- Wukong: a scalable and locality-enhanced framework for serverless parallel computing. In Rodrigo Fonseca, Christina Delimitrou, and Beng Chin Ooi, editors, SoCC ’20: ACM Symposium on Cloud Computing, Virtual Event, USA, October 19-21, 2020, pages 1–15. ACM, 2020.
- Netherite: Efficient execution of serverless workflows. Proc. VLDB Endow., 15(8):1591–1604, 2022.
- Following the data, not the function: Rethinking function orchestration in serverless computing. In 20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23), pages 1489–1504, Boston, MA, April 2023. USENIX Association.
- Fault-tolerant and transactional stateful serverless workflows. In 14th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2020, Virtual Event, November 4-6, 2020, pages 1187–1204. USENIX Association, 2020.
- Faascache: keeping serverless computing alive with greedy-dual caching. In ASPLOS, pages 386–400. ACM, 2021.
- OFC: an opportunistic caching system for faas platforms. In EuroSys, pages 228–244. ACM, 2021.
- The ramcloud storage system. ACM Trans. Comput. Syst., 33(3):7:1–7:55, 2015.
- Infinicache: Exploiting ephemeral serverless functions to build a cost-effective memory cache. In Sam H. Noh and Brent Welch, editors, 18th USENIX Conference on File and Storage Technologies, FAST 2020, Santa Clara, CA, USA, February 24-27, 2020, pages 267–281. USENIX Association, 2020.
- Shuffling, fast and slow: Scalable analytics on serverless infrastructure. In NSDI, pages 193–206. USENIX Association, 2019.
- Pocket: Elastic ephemeral storage for serverless analytics. volume 44, 2019.
- Anna: A KVS for any scale. volume 33, pages 344–358, 2021.
- Autoscaling tiered cloud storage in anna. volume 30, pages 25–43, 2021.
- Transactional causal consistency for serverless computing. In David Maier, Rachel Pottinger, AnHai Doan, Wang-Chiew Tan, Abdussalam Alawini, and Hung Q. Ngo, editors, Proceedings of the 2020 International Conference on Management of Data, SIGMOD Conference 2020, online conference [Portland, OR, USA], June 14-19, 2020, pages 83–97. ACM, 2020.
- Catalyzer: Sub-millisecond startup for serverless computing with initialization-less booting. In James R. Larus, Luis Ceze, and Karin Strauss, editors, ASPLOS ’20: Architectural Support for Programming Languages and Operating Systems, Lausanne, Switzerland, March 16-20, 2020, pages 467–481. ACM, 2020.
- Icebreaker: warming serverless functions better with heterogeneity. In Babak Falsafi, Michael Ferdman, Shan Lu, and Thomas F. Wenisch, editors, ASPLOS ’22: 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Lausanne, Switzerland, 28 February 2022 - 4 March 2022, pages 753–767. ACM, 2022.
- Help rather than recycle: Alleviating cold startup in serverless computing through inter-function container sharing. In Jiri Schindler and Noa Zilberman, editors, 2022 USENIX Annual Technical Conference, USENIX ATC 2022, Carlsbad, CA, USA, July 11-13, 2022, pages 69–84. USENIX Association, 2022.
- SOCK: rapid task provisioning with serverless-optimized containers. In Haryadi S. Gunawi and Benjamin C. Reed, editors, 2018 USENIX Annual Technical Conference, USENIX ATC 2018, Boston, MA, USA, July 11-13, 2018, pages 57–70. USENIX Association, 2018.
- Ray: A distributed framework for emerging AI applications. In OSDI, pages 561–577. USENIX Association, 2018.