Scalable Reinforcement Learning for Linear-Quadratic Control of Networks (2401.16183v2)
Abstract: Distributed optimal control is known to be challenging and can become intractable even for linear-quadratic regulator problems. In this work, we study a special class of such problems where distributed state feedback controllers can give near-optimal performance. More specifically, we consider networked linear-quadratic controllers with decoupled costs and spatially exponentially decaying dynamics. We aim to exploit the structure in the problem to design a scalable reinforcement learning algorithm for learning a distributed controller. Recent work has shown that the optimal controller can be well approximated only using information from a $\kappa$-neighborhood of each agent. Motivated by these results, we show that similar results hold for the agents' individual value and Q-functions. We continue by designing an algorithm, based on the actor-critic framework, to learn distributed controllers only using local information. Specifically, the Q-function is estimated by modifying the Least Squares Temporal Difference for Q-functions method to only use local information. The algorithm then updates the policy using gradient descent. Finally, we evaluate the algorithm through simulations that indeed suggest near-optimal performance.
- B. Bamieh, F. Paganini, and M. Dahleh, “Distributed control of spatially invariant systems,” IEEE Transactions on Automatic Control, vol. 47, no. 7, pp. 1091–1107, 2002.
- N. Motee and A. Jadbabaie, “Optimal control of spatially distributed systems,” IEEE Transactions on Automatic Control, vol. 53, no. 7, pp. 1616–1629, 2008.
- S. Shin, Y. Lin, G. Qu, A. Wierman, and M. Anitescu, “Near-optimal distributed linear-quadratic regulator for networked systems,” SIAM Journal on Control and Optimization, vol. 61, no. 3, pp. 1113–1135, 2023.
- N. Motee and Q. Sun, “Sparsity and spatial localization measures for spatially distributed systems,” SIAM Journal on Control and Optimization, vol. 55, no. 1, pp. 200–235, 2017.
- R. Zhang, W. Li, and N. Li, “On the optimal control of network lqr with spatially-exponential decaying structure,” arXiv preprint arXiv:2209.14376, 2022.
- S. Bradtke, “Reinforcement learning applied to linear quadratic regulation,” Advances in neural information processing systems, vol. 5, 1992.
- A. Lamperski, “Computing stabilizing linear controllers via policy iteration,” in 2020 59th IEEE Conference on Decision and Control (CDC), 2020, pp. 1902–1907.
- K. Krauth, S. Tu, and B. Recht, “Finite-time analysis of approximate policy iteration for the linear quadratic regulator,” Advances in Neural Information Processing Systems, vol. 32, 2019.
- M. Fazel, R. Ge, S. Kakade, and M. Mesbahi, “Global convergence of policy gradient methods for the linear quadratic regulator,” in International conference on machine learning. PMLR, 2018, pp. 1467–1476.
- Y. Abbasi-Yadkori, N. Lazic, and C. Szepesvári, “Model-free linear quadratic control via reduction to expert prediction,” in The 22nd International Conference on Artificial Intelligence and Statistics. PMLR, 2019, pp. 3108–3117.
- G. Jing, H. Bai, J. George, A. Chakrabortty, and P. K. Sharma, “Learning distributed stabilizing controllers for multi-agent systems,” IEEE Control Systems Letters, vol. 6, pp. 301–306, 2021.
- G. Qu, Y. Lin, A. Wierman, and N. Li, “Scalable multi-agent reinforcement learning for networked systems with average reward,” Advances in Neural Information Processing Systems, vol. 33, pp. 2074–2086, 2020.
- Y. Zhang, G. Qu, P. Xu, Y. Lin, Z. Chen, and A. Wierman, “Global convergence of localized policy iteration in networked multi-agent reinforcement learning,” Proceedings of the ACM on Measurement and Analysis of Computing Systems, vol. 7, no. 1, pp. 1–51, 2023.
- Y. Li, Y. Tang, R. Zhang, and N. Li, “Distributed reinforcement learning for decentralized linear quadratic control: A derivative-free policy optimization approach,” IEEE Transactions on Automatic Control, vol. 67, no. 12, pp. 6429–6444, 2021.
- D. Görges, “Distributed adaptive linear quadratic control using distributed reinforcement learning,” IFAC-PapersOnLine, vol. 52, no. 11, pp. 218–223, 2019.
- S. Alemzadeh and M. Mesbahi, “Distributed q-learning for dynamically decoupled systems,” in 2019 American Control Conference (ACC), 2019, pp. 772–777.
- M. G. Lagoudakis and R. Parr, “Least-squares policy iteration,” The Journal of Machine Learning Research, vol. 4, pp. 1107–1149, 2003.
- J. Olsson, “Scalable reinforcement learning for linear-quadratic control of networks,” 2023, M.Sc. Thesis, Lund University. [Online]. Available: http://lup.lub.lu.se/student-papers/record/9137268
- D. Silver, G. Lever, N. Heess, T. Degris, D. Wierstra, and M. Riedmiller, “Deterministic policy gradient algorithms,” in International conference on machine learning. Pmlr, 2014, pp. 387–395.
- X. Zhang, W. Shi, X. Li, B. Yan, A. Malkawi, and N. Li, “Decentralized temperature control via HVAC systems in energy efficient buildings: An approximate solution procedure,” in 2016 IEEE Global Conference on Signal and Information Processing (GlobalSIP). IEEE, 2016, pp. 936–940.