Scalable Reinforcement Learning for Linear-Quadratic Control of Networks (2401.16183v2)
Abstract: Distributed optimal control is known to be challenging and can become intractable even for linear-quadratic regulator problems. In this work, we study a special class of such problems where distributed state feedback controllers can give near-optimal performance. More specifically, we consider networked linear-quadratic controllers with decoupled costs and spatially exponentially decaying dynamics. We aim to exploit the structure in the problem to design a scalable reinforcement learning algorithm for learning a distributed controller. Recent work has shown that the optimal controller can be well approximated only using information from a $\kappa$-neighborhood of each agent. Motivated by these results, we show that similar results hold for the agents' individual value and Q-functions. We continue by designing an algorithm, based on the actor-critic framework, to learn distributed controllers only using local information. Specifically, the Q-function is estimated by modifying the Least Squares Temporal Difference for Q-functions method to only use local information. The algorithm then updates the policy using gradient descent. Finally, we evaluate the algorithm through simulations that indeed suggest near-optimal performance.
- B. Bamieh, F. Paganini, and M. Dahleh, “Distributed control of spatially invariant systems,” IEEE Transactions on Automatic Control, vol. 47, no. 7, pp. 1091–1107, 2002.
- N. Motee and A. Jadbabaie, “Optimal control of spatially distributed systems,” IEEE Transactions on Automatic Control, vol. 53, no. 7, pp. 1616–1629, 2008.
- S. Shin, Y. Lin, G. Qu, A. Wierman, and M. Anitescu, “Near-optimal distributed linear-quadratic regulator for networked systems,” SIAM Journal on Control and Optimization, vol. 61, no. 3, pp. 1113–1135, 2023.
- N. Motee and Q. Sun, “Sparsity and spatial localization measures for spatially distributed systems,” SIAM Journal on Control and Optimization, vol. 55, no. 1, pp. 200–235, 2017.
- R. Zhang, W. Li, and N. Li, “On the optimal control of network lqr with spatially-exponential decaying structure,” arXiv preprint arXiv:2209.14376, 2022.
- S. Bradtke, “Reinforcement learning applied to linear quadratic regulation,” Advances in neural information processing systems, vol. 5, 1992.
- A. Lamperski, “Computing stabilizing linear controllers via policy iteration,” in 2020 59th IEEE Conference on Decision and Control (CDC), 2020, pp. 1902–1907.
- K. Krauth, S. Tu, and B. Recht, “Finite-time analysis of approximate policy iteration for the linear quadratic regulator,” Advances in Neural Information Processing Systems, vol. 32, 2019.
- M. Fazel, R. Ge, S. Kakade, and M. Mesbahi, “Global convergence of policy gradient methods for the linear quadratic regulator,” in International conference on machine learning. PMLR, 2018, pp. 1467–1476.
- Y. Abbasi-Yadkori, N. Lazic, and C. Szepesvári, “Model-free linear quadratic control via reduction to expert prediction,” in The 22nd International Conference on Artificial Intelligence and Statistics. PMLR, 2019, pp. 3108–3117.
- G. Jing, H. Bai, J. George, A. Chakrabortty, and P. K. Sharma, “Learning distributed stabilizing controllers for multi-agent systems,” IEEE Control Systems Letters, vol. 6, pp. 301–306, 2021.
- G. Qu, Y. Lin, A. Wierman, and N. Li, “Scalable multi-agent reinforcement learning for networked systems with average reward,” Advances in Neural Information Processing Systems, vol. 33, pp. 2074–2086, 2020.
- Y. Zhang, G. Qu, P. Xu, Y. Lin, Z. Chen, and A. Wierman, “Global convergence of localized policy iteration in networked multi-agent reinforcement learning,” Proceedings of the ACM on Measurement and Analysis of Computing Systems, vol. 7, no. 1, pp. 1–51, 2023.
- Y. Li, Y. Tang, R. Zhang, and N. Li, “Distributed reinforcement learning for decentralized linear quadratic control: A derivative-free policy optimization approach,” IEEE Transactions on Automatic Control, vol. 67, no. 12, pp. 6429–6444, 2021.
- D. Görges, “Distributed adaptive linear quadratic control using distributed reinforcement learning,” IFAC-PapersOnLine, vol. 52, no. 11, pp. 218–223, 2019.
- S. Alemzadeh and M. Mesbahi, “Distributed q-learning for dynamically decoupled systems,” in 2019 American Control Conference (ACC), 2019, pp. 772–777.
- M. G. Lagoudakis and R. Parr, “Least-squares policy iteration,” The Journal of Machine Learning Research, vol. 4, pp. 1107–1149, 2003.
- J. Olsson, “Scalable reinforcement learning for linear-quadratic control of networks,” 2023, M.Sc. Thesis, Lund University. [Online]. Available: http://lup.lub.lu.se/student-papers/record/9137268
- D. Silver, G. Lever, N. Heess, T. Degris, D. Wierstra, and M. Riedmiller, “Deterministic policy gradient algorithms,” in International conference on machine learning. Pmlr, 2014, pp. 387–395.
- X. Zhang, W. Shi, X. Li, B. Yan, A. Malkawi, and N. Li, “Decentralized temperature control via HVAC systems in energy efficient buildings: An approximate solution procedure,” in 2016 IEEE Global Conference on Signal and Information Processing (GlobalSIP). IEEE, 2016, pp. 936–940.
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.