Policy Gradient-based Model Free Optimal LQG Control with a Probabilistic Risk Constraint (2403.16767v2)
Abstract: In this paper, we investigate a model-free optimal control design that minimizes an infinite horizon average expected quadratic cost of states and control actions subject to a probabilistic risk or chance constraint using input-output data. In particular, we consider linear time-invariant systems and design an optimal controller within the class of linear state feedback control. Three different policy gradient (PG) based algorithms, natural policy gradient (NPG), Gauss-Newton policy gradient (GNPG), and deep deterministic policy gradient (DDPG), are developed, and compared with the optimal risk-neutral linear-quadratic regulator (LQR) and a scenario-based model predictive control (MPC) technique via numerical simulations. The convergence properties and the accuracy of all the algorithms are compared numerically. We also establish analytical convergence properties of the NPG and GNPG algorithms under the known model scenario, while the proof of convergence for the unknown model scenario is part of our ongoing work.
- A. Tsiamis, D. S. Kalogerias, L. F. O. Chamon, A. Ribeiro, and G. J. Pappas, “Risk-Constrained Linear-Quadratic Regulators,” in 2020 59th IEEE Conference on Decision and Control (CDC), Dec. 2020, pp. 3040–3047.
- G. Schildbach, L. Fagiano, C. Frei, and M. Morari, “The scenario approach for Stochastic Model Predictive Control with bounds on closed-loop constraint violations,” Automatica, vol. 50, no. 12, pp. 3009–3018, Dec. 2014.
- J. Fleming and M. Cannon, “Stochastic MPC for Additive and Multiplicative Uncertainty Using Sample Approximations,” IEEE Trans. Automat. Contr., vol. 64, no. 9, pp. 3883–3888, Sep. 2019.
- E. Arcari, A. Iannelli, A. Carron, and M. N. Zeilinger, “Stochastic MPC with robustness to bounded parametric uncertainty,” IEEE Transactions on Automatic Control, pp. 1–14, 2023.
- S. Kerz, J. Teutsch, T. Brüdigam, M. Leibold, and D. Wollherr, “Data-Driven Tube-Based Stochastic Predictive Control,” IEEE Open Journal of Control Systems, vol. 2, pp. 185–199, 2023.
- F. Zhao, X. Fu, and K. You, “Global Convergence of Policy Gradient Methods for Output Feedback Linear Quadratic Control,” arXiv preprint arXiv:2211.04051, 2022.
- F. Zhao and K. You, “Primal-dual learning for the model-free risk-constrained linear quadratic regulator,” in Learning for Dynamics and Control. PMLR, 2021, pp. 702–714.
- F. Zhao, K. You, and T. Basar, “Infinite-horizon Risk-constrained Linear Quadratic Regulator with Average Cost,” in 2021 60th IEEE Conference on Decision and Control (CDC). Austin, TX, USA: IEEE, Dec. 2021, pp. 390–395.
- L. Buşoniu, T. de Bruin, D. Tolić, J. Kober, and I. Palunko, “Reinforcement learning for control: Performance, stability, and deep approximators,” Annual Reviews in Control, vol. 46, pp. 8–28, Jan. 2018.
- V. G. Lopez, M. Alsalti, and M. A. Müller, “Efficient Off-Policy Q-Learning for Data-Based Discrete-Time LQR Problems,” IEEE Transactions on Automatic Control, pp. 1–12, 2023.
- M. Fazel, R. Ge, S. Kakade, and M. Mesbahi, “Global convergence of policy gradient methods for the linear quadratic regulator,” in International Conference on Machine Learning. PMLR, 2018, pp. 1467–1476.
- B. Hu, K. Zhang, N. Li, M. Mesbahi, M. Fazel, and T. Başar, “Toward a Theoretical Foundation of Policy Optimization for Learning Control Policies,” Annual Review of Control, Robotics, and Autonomous Systems, vol. 6, no. 1, pp. 123–158, 2023.
- Z. Yang, Y. Chen, M. Hong, and Z. Wang, “Provably Global Convergence of Actor-Critic: A Case for Linear Quadratic Regulator with Ergodic Cost,” in Advances in Neural Information Processing Systems, vol. 32. Curran Associates, Inc., 2019.
- K. Zhang, B. Hu, and T. Başar, “Policy optimization for H2subscript𝐻2{H}_{2}italic_H start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT linear control with H∞subscript𝐻{H}_{\infty}italic_H start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT robustness guarantee: Implicit regularization and global convergence,” SIAM J. Control Optim., vol. 59, no. 6, pp. 4081–4109, Jan. 2021.
- F. Zhao, K. You, and T. Başar, “Global Convergence of Policy Gradient Primal–Dual Methods for Risk-Constrained LQRs,” IEEE Trans. Automat. Contr., vol. 68, no. 5, pp. 2934–2949, May 2023.
- M. Han, Y. Tian, L. Zhang, J. Wang, and W. Pan, “Reinforcement learning control of constrained dynamic systems with uniformly ultimate boundedness stability guarantee,” Automatica, vol. 129, p. 109689, Jul. 2021.
- A. Naha and S. Dey, “Reinforcement learning based optimal control with a probabilistic risk constraint,” arXiv preprint arXiv:2305.15755, 2023.
- A. Rajeswaran, K. Lowrey, E. V. Todorov, and S. M. Kakade, “Towards Generalization and Simplicity in Continuous Control,” in Advances in Neural Information Processing Systems, vol. 30. Curran Associates, Inc., 2017.
- S. M. Kakade, “A natural policy gradient,” Advances in neural information processing systems, vol. 14, 2001.
- T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra, “Continuous control with deep reinforcement learning, ICLR (2016),” arXiv preprint arXiv:1509.0297, 2016.
- J. Schulman, P. Moritz, S. Levine, M. Jordan, and P. Abbeel, “High-Dimensional Continuous Control Using Generalized Advantage Estimation,” Oct. 2018.
- W. Yu and R. Lui, “Dual methods for nonconvex spectrum optimization of multicarrier systems,” IEEE Transactions on communications, vol. 54, no. 7, pp. 1310–1322, 2006.
- J. Skaf and S. Boyd, “Nonlinear q-design for convex stochastic control,” IEEE Transactions on Automatic Control, vol. 54, no. 10, pp. 2426–2430, 2009.
- I. Coope, “On matrix trace inequalities and related topics for products of hermitian matrices,” Journal of mathematical analysis and applications, vol. 188, no. 3, pp. 999–1001, 1994.