Safe and Accelerated Deep Reinforcement Learning-based O-RAN Slicing: A Hybrid Transfer Learning Approach (2309.07265v2)
Abstract: The open radio access network (O-RAN) architecture supports intelligent network control algorithms as one of its core capabilities. Data-driven applications incorporate such algorithms to optimize radio access network (RAN) functions via RAN intelligent controllers (RICs). Deep reinforcement learning (DRL) algorithms are among the main approaches adopted in the O-RAN literature to solve dynamic radio resource management problems. However, despite the benefits introduced by the O-RAN RICs, the practical adoption of DRL algorithms in real network deployments falls behind. This is primarily due to the slow convergence and unstable performance exhibited by DRL agents upon deployment and when encountering previously unseen network conditions. In this paper, we address these challenges by proposing transfer learning (TL) as a core component of the training and deployment workflows for the DRL-based closed-loop control of O-RAN functionalities. To this end, we propose and design a hybrid TL-aided approach that leverages the advantages of both policy reuse and distillation TL methods to provide safe and accelerated convergence in DRL-based O-RAN slicing. We conduct a thorough experiment that accommodates multiple services, including real VR gaming traffic to reflect practical scenarios of O-RAN slicing. We also propose and implement policy reuse and distillation-aided DRL and non-TL-aided DRL as three separate baselines. The proposed hybrid approach shows at least: 7.7% and 20.7% improvements in the average initial reward value and the percentage of converged scenarios, and a 64.6% decrease in reward variance while maintaining fast convergence and enhancing the generalizability compared with the baselines.
- O-RAN Working Group 2, “O-ran ai/ml workflow description and requirements–v1.01,” O-RAN.WG2.AIML-v01.01 Technical Specification, April 2020.
- A. Garcia-Saavedra and X. Costa-Pérez, “O-ran: Disrupting the virtualized ran ecosystem,” IEEE Communications Standards Magazine, vol. 5, no. 4, pp. 96–103, 2021.
- A. S. Abdalla, P. S. Upadhyaya, V. K. Shah, and V. Marojevic, “Toward next generation open radio access networks: What o-ran can and cannot do!” IEEE Network, vol. 36, no. 6, pp. 206–213, 2022.
- F. D. Calabrese, L. Wang, E. Ghadimi, G. Peters, L. Hanzo, and P. Soldati, “Learning radio resource management in rans: Framework, opportunities, and challenges,” IEEE Communications Magazine, vol. 56, no. 9, pp. 138–145, 2018.
- A. Feriani and E. Hossain, “Single and multi-agent deep reinforcement learning for ai-enabled wireless networks: A tutorial,” IEEE Communications Surveys Tutorials, vol. 23, no. 2, pp. 1tr226–1252, 2021.
- P. H. Masur, J. H. Reed, and N. K. Tripathi, “Artificial intelligence in open-radio access network,” IEEE Aerospace and Electronic Systems Magazine, vol. 37, no. 9, pp. 6–15, 2022.
- L. Maggi, A. Valcarce, and J. Hoydis, “Bayesian optimization for radio resource management: Open loop power control,” IEEE Journal on Selected Areas in Communications, vol. 39, no. 7, pp. 1858–1871, 2021.
- A. M. Nagib, H. Abou-zeid, and H. S. Hassanein, “Toward safe and accelerated deep reinforcement learning for next-generation wireless networks,” IEEE Network, vol. 37, no. 2, pp. 182–189, 2023.
- L. Bonati, M. Polese, S. D’Oro, S. Basagni, and T. Melodia, “Intelligent closed-loop ran control with xapps in openran gym,” in European Wireless 2022; 27th European Wireless Conference, 2022, pp. 1–6.
- N. Kato, B. Mao, F. Tang, Y. Kawamoto, and J. Liu, “Ten challenges in advancing machine learning technologies toward 6g,” IEEE Wireless Communications, vol. 27, no. 3, pp. 96–103, 2020.
- P. Li, J. Thomas, X. Wang, A. Khalil, A. Ahmad, R. Inacio, S. Kapoor, A. Parekh, A. Doufexi, A. Shojaeifard, and R. J. Piechocki, “Rlops: Development life-cycle of reinforcement learning aided open ran,” IEEE Access, vol. 10, pp. 113 808–113 826, 2022.
- A. T. Z. Kasgari, W. Saad, M. Mozaffari, and H. V. Poor, “Experienced deep reinforcement learning with generative adversarial networks (gans) for model-free ultra reliable low latency communication,” IEEE Transactions on Communications, vol. 69, no. 2, pp. 884–899, 2021.
- J. García and F. Fernández, “A comprehensive survey on safe reinforcement learning,” Journal of Machine Learning Research, vol. 16, no. 1, p. 1437–1480, 2015.
- C. T. Nguyen, N. Van Huynh, N. H. Chu, Y. M. Saputra, D. T. Hoang, D. N. Nguyen, Q.-V. Pham, D. Niyato, E. Dutkiewicz, and W.-J. Hwang, “Transfer learning for wireless networks: A comprehensive survey,” Proceedings of the IEEE, vol. 110, no. 8, pp. 1073–1115, 2022.
- H. Zhou, M. Erol-Kantarci, and V. Poor, “Knowledge transfer and reuse: A case study of ai-enabled resource management in ran slicing,” IEEE Wireless Communications, pp. 1–10, 2022.
- R. A. Ferrús Ferré, J. Pérez Romero, J. O. Sallent Roig, I. Vilà Muñoz, and R. Agustí Comes, “Machine learning-assisted cross-slice radio resource optimization: Implementation framework and algorithmic solution,” ITU journal on future and evolving technologies (ITU J-FET), vol. 1, no. 1, pp. 1–18, 2020.
- A. Abouaomar, A. Taik, A. Filali, and S. Cherkaoui, “Federated deep reinforcement learning for open ran slicing in 6g networks,” IEEE Communications Magazine, vol. 61, no. 2, pp. 126–132, 2023.
- H. Zhang, H. Zhou, and M. Erol-Kantarci, “Federated deep reinforcement learning for resource allocation in o-ran slicing,” in IEEE Global Communications Conference (GLOBECOM), 2022, pp. 958–963.
- N. Hammami and K. K. Nguyen, “On-policy vs. off-policy deep reinforcement learning for resource allocation in open radio access network,” in IEEE Wireless Communications and Networking Conference (WCNC), 2022, pp. 1461–1466.
- M. Polese, L. Bonati, S. D’Oro, S. Basagni, and T. Melodia, “Colo-ran: Developing machine learning-based xapps for open ran closed-loop control on programmable experimental platforms,” IEEE Transactions on Mobile Computing, vol. 22, no. 10, pp. 5787–5800, 2023.
- A. Filali, B. Nour, S. Cherkaoui, and A. Kobbane, “Communication and computation o-ran resource slicing for urllc services using deep reinforcement learning,” IEEE Communications Standards Magazine, vol. 7, no. 1, pp. 66–73, 2023.
- F. Lotfi, O. Semiari, and F. Afghah, “Evolutionary deep reinforcement learning for dynamic slice management in o-ran,” in IEEE Globecom Workshops (GC Wkshps), 2022, pp. 227–232.
- J. Saad, K. Khawam, M. Yassin, S. Costanzo, and K. Boulos, “Crowding game and deep q-networks for dynamic ran slicing in 5g networks,” in Proceedings of the 20th ACM International Symposium on Mobility Management and Wireless Access, ser. MobiWac ’22. New York, NY, USA: Association for Computing Machinery, 2022, p. 37–46.
- S. Zhao, H. Abou-zeid, R. Atawia, Y. S. K. Manjunath, A. B. Sediq, and X.-P. Zhang, “Virtual reality gaming on the cloud: A reality check,” in IEEE Global Communications Conference (GLOBECOM), 2021, pp. 1–6.
- L. Bonati, S. D’Oro, M. Polese, S. Basagni, and T. Melodia, “Intelligence and learning in o-ran for data-driven nextg cellular networks,” IEEE Communications Magazine, vol. 59, no. 10, pp. 21–27, 2021.
- O. Sallent, J. Perez-Romero, R. Ferrus, and R. Agusti, “On radio access network slicing from a radio resource management perspective,” IEEE Wireless Communications, vol. 24, no. 5, pp. 166–174, 2017.
- A. M. Nagib, H. Abou-Zeid, and H. S. Hassanein, “Transfer learning-based accelerated deep reinforcement learning for 5g ran slicing,” in IEEE 46th Conference on Local Computer Networks (LCN), 2021, pp. 249–256.
- T. Leibovich-Raveh, D. J. Lewis, S. Al-Rubaiey Kadhim, and D. Ansari, “A new method for calculating individual subitizing ranges,” Journal of Numerical Cognition, vol. 4, no. 2, pp. 429–447, Sep. 2018.
- A. M. Nagib, H. Abou-Zeid, and H. S. Hassanein, “Accelerating reinforcement learning via predictive policy transfer in 6g ran slicing,” IEEE Transactions on Network and Service Management, vol. 20, no. 2, pp. 1170–1183, 2023.
- Z. Zhu, K. Lin, A. K. Jain, and J. Zhou, “Transfer learning in deep reinforcement learning: A survey,” IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1–20, 2023.
- K. A.M., F. Hu, and S. Kumar, “Intelligent spectrum management based on transfer actor-critic learning for rateless transmissions in cognitive radio networks,” IEEE Transactions on Mobile Computing, vol. 17, no. 5, pp. 1204–1215, 2018.
- A. Barreto, W. Dabney, R. Munos, J. J. Hunt, T. Schaul, H. van Hasselt, and D. Silver, “Successor features for transfer in reinforcement learning,” in Proceedings of the 31st International Conference on Neural Information Processing Systems, ser. NIPS’17. Red Hook, NY, USA: Curran Associates Inc., 2017, p. 4058–4068.
- A. A. Rusu, S. G. Colmenarejo, C. Gulcehre, G. Desjardins, J. Kirkpatrick, R. Pascanu, V. Mnih, K. Kavukcuoglu, and R. Hadsell, “Policy distillation,” arXiv preprint arXiv:1511.06295, 2015.
- R. Li, Z. Zhao, Q. Sun, C.-L. I, C. Yang, X. Chen, M. Zhao, and H. Zhang, “Deep reinforcement learning for resource management in network slicing,” IEEE Access, vol. 6, pp. 74 429–74 441, 2018.
- J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” arXiv preprint arXiv:1707.06347, 2017.
- A. M. Nagib, H. Abou-Zeid, and H. S. Hassanein, “How does forecasting affect the convergence of drl techniques in o-ran slicing?” arXiv preprint arXiv:2309.00489, 2023.