2000 character limit reached
An efficient data-based off-policy Q-learning algorithm for optimal output feedback control of linear systems (2312.03451v2)
Published 6 Dec 2023 in eess.SY and cs.SY
Abstract: In this paper, we present a Q-learning algorithm to solve the optimal output regulation problem for discrete-time LTI systems. This off-policy algorithm only relies on using persistently exciting input-output data, measured offline. No model knowledge or state measurements are needed and the obtained optimal policy only uses past input-output information. Moreover, our formulation of the proposed algorithm renders it computationally efficient. We provide conditions that guarantee the convergence of the algorithm to the optimal solution. Finally, the performance of our method is compared to existing algorithms in the literature.
- Notes on data-driven output-feedback control of linear MIMO systems. arXiv:2311.17484, 2023.
- Dimitri Bertsekas. Dynamic programming and optimal control: Volume I. Athena scientific, 2012.
- Adaptive linear quadratic control using policy iteration. In Proceedings of 1994 American Control Conference-ACC’94, volume 3, pages 3475–3479. IEEE, 1994.
- Data-driven optimal output feedback control of linear systems from input-output data. In Proceedings of the 22nd IFAC World Congress, Yokohama, Japan, 2023.
- Global convergence of policy gradient methods for the linear quadratic regulator. In International conference on machine learning, pages 1467–1476. PMLR, 2018.
- Adaptive filtering prediction and control. Courier Corporation, 2014.
- CVX: Matlab software for disciplined convex programming, version 2.1. \urlhttp://cvxr.com/cvx, March 2014.
- Optimal output regulation of linear discrete-time systems with unknown dynamics using reinforcement learning. IEEE Transactions on Cybernetics, 50(7):3147–3156, 2020. 10.1109/TCYB.2018.2890046.
- Thomas Kailath. Linear systems, volume 156. Prentice-Hall Englewood Cliffs, NJ, 1980.
- Optimal tracking control of unknown discrete-time linear systems using input-output measured data. IEEE transactions on cybernetics, 45(12):2770–2779, 2015.
- H∞subscript𝐻{H}_{\infty}italic_H start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT control of linear discrete-time systems: Off-policy reinforcement learning. Automatica, 78:144–152, 2017.
- V Kučera and Carlos E De Souza. A necessary and sufficient condition for output feedback stabilizability. Automatica, 31(9):1357–1359, 1995.
- Reinforcement learning for partially observable dynamic processes: Adaptive dynamic programming using measured output data. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 41(1):14–25, 2010.
- Optimal control. John Wiley & Sons, 3rd edition, 2012.
- Efficient off-policy Q-learning for data-based discrete-time LQR problems. IEEE Transactions on Automatic Control, 68(5):2922–2933, 2023. 10.1109/TAC.2023.3235967.
- Syed Ali Asad Rizvi and Zongli Lin. Output feedback Q-learning control for the discrete-time linear quadratic regulator problem. IEEE Transactions on Neural Networks and Learning Systems, 30(5):1523–1536, 2019. 10.1109/TNNLS.2018.2870075.
- Modified Jacobi-gradient iterative method for generalized Sylvester matrix equation. Symmetry, 12(11):1831, 2020.
- Reinforcement learning: An introduction. MIT press, 2018.
- Willems’ fundamental lemma for state-space systems and its extension to multiple datasets. IEEE Control Systems Letters, 4(3):602–607, 2020.
- Jan C. Willems. From time series to linear system—part I: Finite dimensional linear time invariant systems, part II: Exact modelling, part III: Approximate modelling. Automatica, 22(5):561–580, 675–694, 87–115, 1986. ISSN 0005-1098. https://doi.org/10.1016/0005-1098(86)90066-X.
- A note on persistency of excitation. Systems & Control Letters, 54(4):325–329, 2005. ISSN 0167-6911.
- Data-driven H∞subscript𝐻{H}_{\infty}italic_H start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT optimal output feedback control for linear discrete-time systems based on off-policy Q-learning. IEEE Transactions on Neural Networks and Learning Systems, 34(7):3553–3567, 2023. 10.1109/TNNLS.2021.3112457.
- Mohammad Alsalti (11 papers)
- Victor G. Lopez (27 papers)
- Matthias A. Müller (93 papers)