Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

An efficient data-based off-policy Q-learning algorithm for optimal output feedback control of linear systems (2312.03451v2)

Published 6 Dec 2023 in eess.SY and cs.SY

Abstract: In this paper, we present a Q-learning algorithm to solve the optimal output regulation problem for discrete-time LTI systems. This off-policy algorithm only relies on using persistently exciting input-output data, measured offline. No model knowledge or state measurements are needed and the obtained optimal policy only uses past input-output information. Moreover, our formulation of the proposed algorithm renders it computationally efficient. We provide conditions that guarantee the convergence of the algorithm to the optimal solution. Finally, the performance of our method is compared to existing algorithms in the literature.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (22)
  1. Notes on data-driven output-feedback control of linear MIMO systems. arXiv:2311.17484, 2023.
  2. Dimitri Bertsekas. Dynamic programming and optimal control: Volume I. Athena scientific, 2012.
  3. Adaptive linear quadratic control using policy iteration. In Proceedings of 1994 American Control Conference-ACC’94, volume 3, pages 3475–3479. IEEE, 1994.
  4. Data-driven optimal output feedback control of linear systems from input-output data. In Proceedings of the 22nd IFAC World Congress, Yokohama, Japan, 2023.
  5. Global convergence of policy gradient methods for the linear quadratic regulator. In International conference on machine learning, pages 1467–1476. PMLR, 2018.
  6. Adaptive filtering prediction and control. Courier Corporation, 2014.
  7. CVX: Matlab software for disciplined convex programming, version 2.1. \urlhttp://cvxr.com/cvx, March 2014.
  8. Optimal output regulation of linear discrete-time systems with unknown dynamics using reinforcement learning. IEEE Transactions on Cybernetics, 50(7):3147–3156, 2020. 10.1109/TCYB.2018.2890046.
  9. Thomas Kailath. Linear systems, volume 156. Prentice-Hall Englewood Cliffs, NJ, 1980.
  10. Optimal tracking control of unknown discrete-time linear systems using input-output measured data. IEEE transactions on cybernetics, 45(12):2770–2779, 2015.
  11. H∞subscript𝐻{H}_{\infty}italic_H start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT control of linear discrete-time systems: Off-policy reinforcement learning. Automatica, 78:144–152, 2017.
  12. V Kučera and Carlos E De Souza. A necessary and sufficient condition for output feedback stabilizability. Automatica, 31(9):1357–1359, 1995.
  13. Reinforcement learning for partially observable dynamic processes: Adaptive dynamic programming using measured output data. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 41(1):14–25, 2010.
  14. Optimal control. John Wiley & Sons, 3rd edition, 2012.
  15. Efficient off-policy Q-learning for data-based discrete-time LQR problems. IEEE Transactions on Automatic Control, 68(5):2922–2933, 2023. 10.1109/TAC.2023.3235967.
  16. Syed Ali Asad Rizvi and Zongli Lin. Output feedback Q-learning control for the discrete-time linear quadratic regulator problem. IEEE Transactions on Neural Networks and Learning Systems, 30(5):1523–1536, 2019. 10.1109/TNNLS.2018.2870075.
  17. Modified Jacobi-gradient iterative method for generalized Sylvester matrix equation. Symmetry, 12(11):1831, 2020.
  18. Reinforcement learning: An introduction. MIT press, 2018.
  19. Willems’ fundamental lemma for state-space systems and its extension to multiple datasets. IEEE Control Systems Letters, 4(3):602–607, 2020.
  20. Jan C. Willems. From time series to linear system—part I: Finite dimensional linear time invariant systems, part II: Exact modelling, part III: Approximate modelling. Automatica, 22(5):561–580, 675–694, 87–115, 1986. ISSN 0005-1098. https://doi.org/10.1016/0005-1098(86)90066-X.
  21. A note on persistency of excitation. Systems & Control Letters, 54(4):325–329, 2005. ISSN 0167-6911.
  22. Data-driven H∞subscript𝐻{H}_{\infty}italic_H start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT optimal output feedback control for linear discrete-time systems based on off-policy Q-learning. IEEE Transactions on Neural Networks and Learning Systems, 34(7):3553–3567, 2023. 10.1109/TNNLS.2021.3112457.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Mohammad Alsalti (11 papers)
  2. Victor G. Lopez (27 papers)
  3. Matthias A. Müller (93 papers)

Summary

We haven't generated a summary for this paper yet.