Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 134 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 24 tok/s Pro
GPT-5 High 26 tok/s Pro
GPT-4o 92 tok/s Pro
Kimi K2 193 tok/s Pro
GPT OSS 120B 439 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

Towards Fair and Firm Real-Time Scheduling in DNN Multi-Tenant Multi-Accelerator Systems via Reinforcement Learning (2403.00766v1)

Published 9 Feb 2024 in cs.AR, cs.DC, and cs.LG

Abstract: This paper addresses the critical challenge of managing Quality of Service (QoS) in cloud services, focusing on the nuances of individual tenant expectations and varying Service Level Indicators (SLIs). It introduces a novel approach utilizing Deep Reinforcement Learning for tenant-specific QoS management in multi-tenant, multi-accelerator cloud environments. The chosen SLI, deadline hit rate, allows clients to tailor QoS for each service request. A novel online scheduling algorithm for Deep Neural Networks in multi-accelerator systems is proposed, with a focus on guaranteeing tenant-wise, model-specific QoS levels while considering real-time constraints.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (17)
  1. M. Hamdaoui and P. Ramanathan, “A dynamic priority assignment technique for streams with (m, k)-firm deadlines,” IEEE transactions on Computers, vol. 44, no. 12, pp. 1443–1451, 1995.
  2. E. Russo, M. Palesi, S. Monteleone, D. Patti, G. Ascia, and V. Catania, “Medea: A multi-objective evolutionary approach to dnn hardware mapping,” in 2022 Design, Automation & Test in Europe Conference & Exhibition (DATE), 2022, pp. 226–231.
  3. S.-C. Kao and T. Krishna, “Magma: An optimization framework for mapping multiple dnns on multiple accelerator cores,” in 2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA).   IEEE, 2022, pp. 814–830.
  4. E. Russo, M. Palesi, D. Patti, S. Monteleone, G. Ascia, and V. Catania, “Multiobjective end-to-end design space exploration of parameterized dnn accelerators,” IEEE Internet of Things Journal, vol. 10, no. 2, pp. 1800–1812, 2023.
  5. Y. Choi and M. Rhu, “Prema: A predictive multi-task scheduling algorithm for preemptible neural processing units,” in 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA).   IEEE, 2020, pp. 220–233.
  6. H. Kwon, L. Lai, M. Pellauer, T. Krishna, Y.-H. Chen, and V. Chandra, “Heterogeneous dataflow accelerators for multi-dnn workloads,” in 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA).   IEEE, 2021, pp. 71–83.
  7. S. Ghodrati, B. H. Ahn, J. K. Kim, S. Kinzer, B. R. Yatham, N. Alla, H. Sharma, M. Alian, E. Ebrahimi, N. S. Kim et al., “Planaria: Dynamic architecture fission for spatial multi-tenant acceleration of deep neural networks,” in 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).   IEEE, 2020, pp. 681–697.
  8. S. Kim, H. Genc, V. V. Nikiforov, K. Asanović, B. Nikolić, and Y. S. Shao, “Moca: Memory-centric, adaptive execution for multi-tenant deep neural networks,” in 2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA).   IEEE, 2023, pp. 828–841.
  9. J. Soifer, J. Li, M. Li, J. Zhu, Y. Li, Y. He, E. Zheng, A. Oltean, M. Mosyak, C. Barnes et al., “Deep learning inference service at microsoft,” in 2019 USENIX Conference on Operational Machine Learning (OpML 19), 2019, pp. 15–17.
  10. L. Abeni and G. Buttazzo, “Qos guarantee using probabilistic deadlines,” in Proceedings of 11th Euromicro Conference on Real-Time Systems. Euromicro RTS’99.   IEEE, 1999, pp. 242–249.
  11. Y. S. Shao, J. Clemons, R. Venkatesan, B. Zimmer, M. Fojtik, N. Jiang, B. Keller, A. Klinefelter, N. Pinckney, P. Raina et al., “Simba: Scaling deep-learning inference with multi-chip-module-based architecture,” in Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019, pp. 14–27.
  12. Y.-H. Chen, T.-J. Yang, J. Emer, and V. Sze, “Eyeriss v2: A flexible accelerator for emerging deep neural networks on mobile devices,” IEEE Journal on Emerging and Selected Topics in Circuits and Systems, vol. 9, no. 2, pp. 292–308, 2019.
  13. G. Da Costa, L. Grange, and I. De Courchelle, “Modeling and generating large-scale google-like workload,” in 2016 Seventh International Green and Sustainable Computing Conference (IGSC).   IEEE, 2016, pp. 1–7.
  14. A. Parashar, P. Raina, Y. S. Shao, Y.-H. Chen, V. A. Ying, A. Mukkara, R. Venkatesan, B. Khailany, S. W. Keckler, and J. Emer, “Timeloop: A systematic approach to dnn accelerator evaluation,” in 2019 IEEE international symposium on performance analysis of systems and software (ISPASS).   IEEE, 2019, pp. 304–315.
  15. T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra, “Continuous control with deep reinforcement learning,” arXiv preprint arXiv:1509.02971, 2015.
  16. S. Kim, H. Genc, V. V. Nikiforov, K. Asanović, B. Nikolić, and Y. S. Shao, “Moca: Memory-centric, adaptive execution for multi-tenant deep neural networks,” in 2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA), 2023, pp. 828–841.
  17. G. Li, J. Wu, J. Li, Z. Zhou, and L. Guo, “Sla-aware fine-grained qos provisioning for multi-tenant software-defined networks,” IEEE access, vol. 6, pp. 159–170, 2017.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.