Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 100 tok/s
Gemini 2.5 Pro 51 tok/s Pro
GPT-5 Medium 26 tok/s Pro
GPT-5 High 33 tok/s Pro
GPT-4o 103 tok/s Pro
Kimi K2 200 tok/s Pro
GPT OSS 120B 447 tok/s Pro
Claude Sonnet 4 36 tok/s Pro
2000 character limit reached

Two-step reinforcement learning for model-free redesign of nonlinear optimal regulator (2103.03808v4)

Published 5 Mar 2021 in eess.SY, cs.LG, and cs.SY

Abstract: In many practical control applications, the performance level of a closed-loop system degrades over time due to the change of plant characteristics. Thus, there is a strong need for redesigning a controller without going through the system modeling process, which is often difficult for closed-loop systems. Reinforcement learning (RL) is one of the promising approaches that enable model-free redesign of optimal controllers for nonlinear dynamical systems based only on the measurement of the closed-loop system. However, the learning process of RL usually requires a considerable number of trial-and-error experiments using the poorly controlled system that may accumulate wear on the plant. To overcome this limitation, we propose a model-free two-step design approach that improves the transient learning performance of RL in an optimal regulator redesign problem for unknown nonlinear systems. Specifically, we first design a linear control law that attains some degree of control performance in a model-free manner, and then, train the nonlinear optimal control law with online RL by using the designed linear control law in parallel. We introduce an offline RL algorithm for the design of the linear control law and theoretically guarantee its convergence to the LQR controller under mild assumptions. Numerical simulations show that the proposed approach improves the transient learning performance and efficiency in hyperparameter tuning of RL.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (13)
  1. Hou Z, Xiong S. On model-free adaptive control and its stability analysis. IEEE Transactions on Automatic Control. 2019;64(11):4555–4569. DOI: 10.1109/TAC.2019.2894586.
  2. Kaneko O. Data-driven controller tuning: FRIT approach. In: Proc. 11th IFAC International Workshop on Adaptation and Learning in Control and Signal Processing; 2013. p. 326–336. DOI: 10.3182/20130703-3-FR-4038.00122.
  3. Sutton RS, Barto AG. Reinforcement learning: An introduction. 2nd ed. MIT Press; 2018.
  4. Doya K. Reinforcement learning in continuous time and space. Neural Computation. 2000;12(1):219–245. DOI: 10.1162/089976600300015961.
  5. Vamvoudakis KG, Lewis FL. Online actor–critic algorithm to solve the continuous-time infinite horizon optimal control problem. Automatica. 2010;46(5):878–888. DOI: 10.1109/TSMCC.2002.801727.
  6. Zanon M, Gros S. Safe reinforcement learning using robust MPC. IEEE Transactions on Automatic Control. 2021;66(8):3638–3652. DOI: 10.1109/TAC.2020.3024161.
  7. Hewer GA. An iterative technique for the computation of the steady state gains for the discrete optimal regulator. IEEE Transactions on Automatic Control. 1971;16(4):382–384. DOI: 10.1109/TAC.1971.1099755.
  8. Horn RA, Johnson CR. Topics in matrix analysis. 1st ed. Cambridge University Press; 1994.
  9. Jiang Y, Jiang ZP. Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics. Automatica. 2012;48(10):2699–2704. DOI: 10.1016/j.automatica.2012.06.096.
  10. Bian T, Jiang ZP. Value iteration and adaptive dynamic programming for data-driven adaptive optimal control design. Automatica. 2016;71:348–360. DOI: 10.1016/j.automatica.2016.05.003.
  11. Rizvi SAA, Lin Z. Reinforcement learning-based linear quadratic regulation of continuous-time systems using dynamic output feedback. IEEE Transactions on Cybernetics. 2020;50(11):4670–4679. DOI: 10.1109/TCYB.2018.2886735.
  12. Lewis FL, Vrabie D. Reinforcement learning and adaptive dynamic programming for feedback control. IEEE Circuits and Systems Magazine. 2009;9(3):32–50. DOI: 10.1109/MCAS.2009.933854.
  13. Lee D, Hu J. Primal-dual Q-learning framework for LQR design. IEEE Transactions on Automatic Control. 2019;64(9):3756–3763. DOI: 10.1109/TAC.2018.2884649.
Citations (1)
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-Up Questions

We haven't generated follow-up questions for this paper yet.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube