An Interference-aware Approach for Co-located Container Orchestration with Novel Metric (2402.08917v1)
Abstract: Container orchestration technologies are widely employed in cloud computing, facilitating the co-location of online and offline services on the same infrastructure. Online services demand rapid responsiveness and high availability, whereas offline services require extensive computational resources. However, this mixed deployment can lead to resource contention, adversely affecting the performance of online services, yet the metrics used by existing methods cannot accurately reflect the extent of interference. In this paper, we introduce scheduling latency as a novel metric for quantifying interference and compare it with existing metrics. Empirical evidence demonstrates that scheduling latency more accurately reflects the performance degradation of online services. We also utilize various machine learning techniques to predict potential interference on specific hosts for online services, providing reference information for subsequent scheduling decisions. Simultaneously, we propose a method for quantifying node interference based on scheduling latency. To enhance resource utilization, we train a model for online services that predicts CPU and MEM (memory) resource allocation based on workload type and QPS. Finally, we present a scheduling algorithm based on predictive modeling, aiming to reduce interference in online services while balancing node resource utilization. Through experiments and comparisons with three other baseline methods, we demonstrate the effectiveness of our approach. Compared with three baselines, our approach can reduce the average response time, 90th percentile response time, and 99th percentile response time of online services by 29.4%, 31.4%, and 14.5%, respectively.
- M. Chae, H. Lee, and K. Lee, “A performance comparison of linux containers and virtual machines using docker and kvm,” Cluster Computing, vol. 22, no. Suppl 1, pp. 1765–1775, 2019.
- Develop faster & run anywhere: Docker. [Online]. Available: https://www.docker.com/
- Podman: A tool for managing oci containers and pods. [Online]. Available: https://github.com/containers/podman
- Lxc - linux containers. [Online]. Available: https://github.com/lxc/lxc
- C. Lu, K. Ye, G. Xu, C.-Z. Xu, and T. Bai, “Imbalance in the cloud: An analysis on alibaba cluster trace,” in 2017 IEEE International Conference on Big Data (Big Data), 2017, pp. 2884–2892.
- S. R. Alam, M. Gila, M. Klein, M. Martinasso, and T. C. Schulthess, “Versatile software-defined hpc and cloud clusters on alps supercomputer for diverse workflows,” The International Journal of High Performance Computing Applications, vol. 37, no. 3-4, pp. 288–305, 2023.
- Y. Meng, S. Zhang, Y. Sun, R. Zhang, Z. Hu, Y. Zhang, C. Jia, Z. Wang, and D. Pei, “Localizing failure root causes in a microservice through causality inference,” in 2020 IEEE/ACM 28th International Symposium on Quality of Service (IWQoS), 2020, pp. 1–10.
- Y. Tan, H. Nguyen, Z. Shen, X. Gu, C. Venkatramani, and D. Rajan, “Prepare: Predictive performance anomaly prevention for virtualized cloud systems,” in 2012 IEEE 32nd International Conference on Distributed Computing Systems, 2012, pp. 285–294.
- C. Lo, David and et al, “Heracles: Improving resource efficiency at scale,” in 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA), 2015, pp. 450–462.
- H. Qiu, S. S. Banerjee, and et al, “FIRM: An intelligent fine-grained resource management framework for SLO-Oriented microservices,” in 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20). USENIX Association, Nov. 2020, pp. 805–825.
- W. Chen, K. Ye, and C.-Z. Xu, “Co-locating online workload and offline workload in the cloud: An interference analysis,” in 2019 IEEE 21st International Conference on High Performance Computing and Communications; IEEE 17th International Conference on Smart City; IEEE 5th International Conference on Data Science and Systems (HPCC/SmartCity/DSS), 2019, pp. 2278–2283.
- M. Xu, C. Song, S. Ilager, S. S. Gill, J. Zhao, K. Ye, and C. Xu, “Coscal: Multifaceted scaling of microservices with reinforcement learning,” IEEE Transactions on Network and Service Management, vol. 19, no. 4, pp. 3995–4009, 2022.
- S. Luo, H. Xu, K. Ye, G. Xu, L. Zhang, J. He, G. Yang, and C. Xu, “Erms: Efficient resource management for shared microservices with sla guarantees,” in Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 1, ser. ASPLOS 2023. New York, NY, USA: Association for Computing Machinery, 2022, p. 62–77.
- M. Xu, L. Yang, Wang, and et al, “Practice of alibaba cloud on elastic resource provisioning for large-scale microservices cluster,” Software: Practice and Experience, 2023.
- M. A. Rodriguez and R. Buyya, “Container-based cluster orchestration systems: A taxonomy and future directions,” Software: Practice and Experience, vol. 49, no. 5, pp. 698–719, 2019.
- E. Casalicchio and S. Iannucci, “The state-of-the-art in container technologies: Application, orchestration and security,” Concurrency and Computation: Practice and Experience, vol. 32, no. 17, p. e5668, 2020, e5668 cpe.5668.
- M. Rodriguez and R. Buyya, “Container orchestration with cost-efficient autoscaling in cloud computing environments,” in Handbook of research on multimedia cyber security. IGI global, 2020, pp. 190–213.
- V. Struhár, S. S. Craciunas, M. Ashjaei, M. Behnam, and A. V. Papadopoulos, “React: Enabling real-time container orchestration,” in 2021 26th IEEE International Conference on Emerging Technologies and Factory Automation (ETFA ), 2021, pp. 1–8.
- X. Su, X. Yan, and C.-L. Tsai, “Linear regression,” Wiley Interdisciplinary Reviews: Computational Statistics, vol. 4, no. 3, pp. 275–294, 2012.
- S. Suthaharan, “Machine learning models and algorithms for big data classification,” Integr. Ser. Inf. Syst, vol. 36, pp. 1–12, 2016.
- H. Ramchoun, Y. Ghanou, M. Ettaouil, and M. A. Janati Idrissi, “Multilayer perceptron: Architecture optimization and training,” 2016.
- G. Biau and E. Scornet, “A random forest guided tour,” Test, vol. 25, pp. 197–227, 2016.
- T. Chen and C. Guestrin, “Xgboost: A scalable tree boosting system,” in Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, 2016, pp. 785–794.
- C. Lu, H. Xu, K. Ye, G. Xu, L. Zhang, G. Yang, and C. Xu, “Understanding and optimizing workloads for unified resource management in large cloud platforms,” in Proceedings of the Eighteenth European Conference on Computer Systems, ser. EuroSys ’23. New York, NY, USA: Association for Computing Machinery, 2023, p. 416–432.