Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Data-Driven Robust Reinforcement Learning Control of Uncertain Nonlinear Systems: Towards a Fully-Automated, Insulin-Based Artificial Pancreas (2312.04503v1)

Published 7 Dec 2023 in eess.SY and cs.SY

Abstract: In this paper, a novel robust tracking control scheme for a general class of discrete-time nonlinear systems affected by unknown bounded uncertainty is presented. By solving a parameterized optimal tracking control problem subject to the unknown nominal system and a suitable cost function, the resulting optimal tracking control policy can ensure closed-loop stability by achieving a sufficiently small tracking error for the original uncertain nonlinear system. The computation of the optimal tracking controller is accomplished through the derivation of a novel Q-function-based $\lambda$-Policy Iteration algorithm. The proposed algorithm not only enjoys rigorous theoretical guarantees, but also avoids technical weaknesses of conventional reinforcement learning methods. By employing a data-driven, critic-only least squares implementation, the performance of the proposed algorithm is evaluated to the problem of fully-automated, insulin-based, closed-loop glucose control for patients diagnosed with Type 1 and Type 2 Diabetes Mellitus. The U.S. FDA-accepted DMMS.R simulator from the Epsilon Group is used to conduct a comprehensive in silico clinical campaign on a rich set of virtual subjects under completely unannounced meal and exercise settings. Simulation results underline the superior glycaemic behavior achieved by the derived approach, as well as its overall maturity for the design of highly-effective, closed-loop drug delivery systems for personalized medicine.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (81)
  1. K. Zhou, J. C. Doyle, and K. Glover, “Robust and Optimal Control”, Upper Saddle River, NJ, USA: Prentice Hall, 1996.
  2. D. Wang, H. He, and D. Liu, “Adaptive critic nonlinear robust control: A survey”, IEEE Trans. Cybernetics, vol. 47, no. 10, pp. 3429-3451, 2017.
  3. I. R. Petersen, “A stabilization algorithm for a class of uncertain linear systems”, Syst. Control Lett., vol. 8, no. 4, pp. 351-357, 1987.
  4. I. R. Petersen, and C. V. Hollot, “A Riccati equation approach to the stabilization of uncertain linear systems”, Automatica, vol. 22, no. 4, pp. 397-411, 1986.
  5. F. Lin, “An optimal control approach to robust control design”, Int. J. Control, vol. 73, no. 3, pp. 177-186, 2000.
  6. H. Tan, S. Shu, and F. Lin, “An optimal control approach to robust tracking of linear systems”, Int. J. Control, vol. 82, no. 3, pp. 525–540, 2009.
  7. B. Kiumarsi, K. G. Vamvoudakis, H. Modares, and F. L. Lewis, “Optimal and autonomous control using reinforcement learning: A survey”, IEEE Trans. Neur. Netw. Learn. Syst., vol. 29, no. 6, pp. 2042-2062, 2017.
  8. D. Liu, S. Xuo, B. Zhao, B. Luo, and Q. Wei, “Adaptive dynamic programming for control: A survey and recent advances”, vol. 51, no. 1, pp. 142-160, 2021.
  9. D. Liu, X. Yang, D. Wang, and Q. Wei, “Reinforcement-learning-based robust controller design for continuous-time uncertain nonlinear systems subject to input constraints”, IEEE Trans. Cybern., vol. 45, no. 7, pp. 1372-1385, 2015.
  10. D. Wang, D. Liu, H. Li, B. Luo, and H. Ma, “An approximate optimal control approach for robust stabilization of a class of discrete-time nonlinear systems with uncertainties”, IEEE Trans. Syst. Man Cybern.: Syst., vol. 46, no. 5, pp. 713-717, 2016.
  11. H. Zhang, K. Zhang, G. Xiao, and H. Zhang, “Robust optimal control scheme for unknown constrained-input nonlinear systems via a plug-n-play event-sampled critic-only algorithm”, IEEE Trans. Syst. Man Cybern.: Syst., vol. 50, no. 9, pp. 3169-3180, 2020.
  12. H. Jiang, H. Zhang, Y. Luo, and J. Han, “Neural-network-based robust control schemes for nonlinear multiplayer systems with uncertainties via adaptive dynamic programming”, IEEE Trans. Syst. Man Cybern.: Syst., vol. 49, no. 3, pp. 579-588, 2019.
  13. X. Wang, and H. He, “Adaptive critic designs for event-triggered robust control of nonlinear systems with unknown dynamics”, IEEE Trans. Cybern., vol. 49, no. 3, pp. 2255-267, 2019.
  14. J. Li, J. Ding, T. Chai, F. L. Lewis, and S. Jagannathan, “Adaptive interleaved reinforcement learning: Robust stability of affine nonlinear systems with unknown uncertainty”, IEEE Trans. Neur. Netw. Learn. Syst., vol. 33, no. 1, pp. 270-280, 2022.
  15. N. Sekhar Tripathy, I. Narayan Kar, and K. Paul, “Suboptimal Robust Stabilization of Discrete-time Mismatched Nonlinear System”, IEEE Trans. Autom. Sinica, vol. 5, no. 1, pp. 352-359, 2018.
  16. X. Yang, and H. He, “Event-triggered robust stabilization of nonlinear input-constrained systems using single network adaptive critic designs”, IEEE Trans. Sys. Man Cybern.: Sys., vol. 50, no. 9, 2020.
  17. S. Xue, B. Luo, and D. Liu, “Event-triggered adaptive dynamic programming for unmatched uncertain nonlinear continuous-time systems”, IEEE Trans. Neur. Netw. Learn. Syst., vol. 32, no. 7, pp. 2939-2951, 2021.
  18. X. Yang, Y. Zhou, N. Dong, and Q. Wei, “Adaptive critics for decentralized stabilization of constrained-input nonlinear interconnected systems”, IEEE Trans. Sys. Man Cybern.: Syst., vol. 52, no. 7, pp. 4187-4199, 2022.
  19. G. Xiao, H. Zhang, K. Zhang, and Y. Wen, “Value iteration based integral reinforcement learning approach for H∞subscript𝐻H_{\infty}italic_H start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT controller design of continuous-time nonlinear systems”, Neurocomputing, vol. 285, pp. 51-59, 2018.
  20. J. Hou, D. Wang, D. Liu, and Y. Zhang, “Model-free H∞subscript𝐻H_{\infty}italic_H start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT optimal tracking control of constrained nonlinear systems via an iterative adaptive learning algorithm”, IEEE Trans. Sys. Man Cybern.: Syst., vol. 50, no. 11, pp. 4097-4108, 2020.
  21. R. Song, and L. Zhu, “Optimal fixed point tracking control for discrete-time nonlinear systems via ADP”, IEEE /CAA J. Autom. Sinica, vol. 6, no. 3, pp. 657-666, 2019.
  22. S. Xue, B. Luo, D. Liu, and Y. Gao, “Neural network-based event-triggered integral reinforcement learning for constrained H∞subscript𝐻H_{\infty}italic_H start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT tracking control with experience replay”, Neurocomputing, vol. 513, pp. 25-35, 2022.
  23. H.-Ning Wu, and Z.-Yang Liu, “Data-driven guaranteed cost control design via reinforcement learning for linear systems with parameter uncertainties”, IEEE Trans. Sys. Man Cybern.: Sys., vol. 50, no. 11, pp. 4151-4159, 2020.
  24. D. Wang, J. Qiao, and L. Cheng, “An approximate neuro-optimal solution of discounted guaranteed cost control design”, IEEE Trans. Cybern., vol. 52, no. 1, pp. 77-86, 2022.
  25. J. Ma, Z. Cheng, X. Zhang, M. Tomizuka, and T. H. Lee, “On symmetric Gauss–Seidel ADMM algorithm for H∞subscript𝐻H_{\infty}italic_H start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT guaranteed cost control with convex parameterization”, IEEE Trans. Sys. Man Cybern.: Sys., vol. 53, no. 2, pp. 1015-1026, 2023.
  26. Q. Yang, and S. Jagannathan, “Reinforcement learning controller design for affine nonlinear discrete-time systems using online approximators”, IEEE Trans. Cybern., vol. 42, no. 2, pp. 377-390, 2012.
  27. R. Song, F. L. Lewis, Q. Wei, and H. Zhang, “Off-policy actor-critic structure for optimal control of unknown systems with disturbances”, IEEE Trans. Cybern., vol. 46, no. 5, pp. 1041-1050, 2016.
  28. R. Song, and F. L. Lewis, “Robust optimal control for a class of nonlinear systems with unknown disturbances based on disturbance observer and policy iteration”, Neurocomputing, vol. 390, pp. 185-195, 2020.
  29. B. Zhao, D. Liu, and C. Alippi, “Sliding-mode surface-based approximate optimal control for uncertain nonlinear systems with asymptotically stable critic structure”, IEEE Trans. Cybern., vol. 51, no. 6, pp. 2858-2869, 2021.
  30. H. Zhang, X. Zhao, H. Wang, G. Zong, and N. Xu, “Hierarchical sliding-mode surface-based adaptive actor–critic optimal control for switched nonlinear systems with unknown perturbation”, IEEE Trans. Neur. Netw. Learn. Syst. (Early Access), 2022.
  31. Y. Jiang, and Z.-P. Jiang, “Robust adaptive dynamic programming for large-scale systems with an application to multimachine power systems”, IEEE Trans. Circ. Sys.-II: Expr. Br., vol. 59, no. 10, pp. 693-697, 2012.
  32. Y. Jiang, and Z.-P. Jiang, “Robust adaptive dynamic programming and feedback stabilization of nonlinear systems”, IEEE Trans. Neur. Netw. Learn. Sys., vol. 25, no. 5, pp. 882-893, 2014.
  33. F. Zhao, W. Gao, T. Liu, and Z.-P. Jiang, “Event-triggered robust adaptive dynamic programming with output feedback for large-scale systems”, IEEE Trans. Control Netw. Sys., vol. 10, no. 1, pp. 63-74, 2023.
  34. D. Liu, and Q. Wei, “Policy iteration adaptive dynamic programming algorithm for discrete-time nonlinear systems”, IEEE Trans. Neur. Netw. Learn. Sys., vol. 25, no. 3, pp. 621-634, 2014.
  35. B. Luo, D. Liu, T. Huang, and D. Wang, “Model-free optimal tracking control via critic-only Q-learning”, IEEE Trans. Neur. Netw. Learn. Syst., vol. 27, no. 10, pp. 2134-2144, 2016.
  36. B. Luo, D. Liu, and H.-N. Wu, “Adaptive constrained optimal control design for data-based nonlinear discrete-time Systems with critic-only structure”, IEEE Trans. Neur. Netw. Learn. Sys., vol. 29, no. 6, pp. 2099-2111, 2018.
  37. A. Heydari, “Analyzing policy iteration in optimal control”, Proc. Amer. Contr. Conf. (ACC), Boston, MA, USA, 2016.
  38. D. P. Bertsekas, and S. Ioffe, “Temporal differences-based policy iteration and applications in neuro-dynamic programming”, Lab. Inf. Decis. Syst., Tech. Rep. LIDS-2349, MIT, Cambridge, MA, USA, 1996.
  39. R. S. Sutton and A. G. Barto, “Reinforcement learning: An introduction”, Second Edition, Cambridge, MA, USA: MIT Press, 2018.
  40. D. P. Bertsekas, “Lambda-policy iteration: A review and a new implementation”, Lab. Inf. Decis. Syst., Tech. Rep. LIDS-2874, MIT, Cambridge, MA, USA, 2012.
  41. C. Thiery, and B. Scherrer, “Least-squares λ𝜆\lambdaitalic_λ policy iteration: Bias-variance trade-off in control problems”, Proc. Intern. Conf. Machine Learn. (ICML), Haifa, Israel, 2010.
  42. B. Scherrer, “Performance bounds for λ𝜆\lambdaitalic_λ policy iteration and application to the game of tetris”, J. Machine Learn. Research, vol. 14, pp. 1175-1221, 2013.
  43. Dimitri P. Bertsekas, “Abstract dynamic programming”, Second Edition, Athena Scientific, 2018.
  44. Y. Li, K. H. Johansson, and J. Martensson, “Lambda-policy iteration with randomization for contractive models with infinite policies: Well-posedness and convergence”, Proc. Annual Conf. Learn. Dynam. Contr. (L4DC), Virtual, 2020.
  45. Y. Yang, B. Kiumarsi, H. Modares, and C. Xu, “Model-free λ𝜆\lambdaitalic_λ-policy iteration for discrete-time linear quadratic regulation”, IEEE Trans. Neur. Netw. Learn. Sys., vol. 34, no. 2, pp. 635-649, 2023.
  46. A. Cinar, “Artificial pancreas systems: An introduction to the special issue”, IEEE Contr. Sys. Magazine, vol. 38, no. 1, 2018.
  47. K. Owen, H. Tumer, and J. Wass, “Oxford handbook of endocrinology and diabetes”, Fourth Edition, Oxford University Press, United Kingdom, 2022.
  48. R. S. Sánchez-Peña, D. R. Cherñavvsky , and E. N. Sánchez, “The artificial pancreas: Current situation and future direction”, Elsevier Academic Press, 2019.
  49. C. Toffanin, R. Visentin, M. Messori, F. Di Palma, L. Magni, and C. Cobelli, “Toward a run-to-run adaptive artificial pancreas: In silico results”, IEEE Trans. Biomed. Eng., vol. 65, no. 3, pp. 479-488, 2018.
  50. P. Abuin, P. S. Rivadeneira, A. Ferramosca, and A. H. González, “ Artificial pancreas under stable pulsatile MPC: Improving the closed-loop performance”, J. Process Contr., vol. 92, pp. 246-260, 2020.
  51. A.-L. Alshalalfah, G. B. Hamad, and O. A. Mohamed, “Towards safe and robust closed-loop artificial pancreas using improved PID-based control strategies”, IEEE Trans. Circ. Sys. I, vol. 68, no. 8, pp. 3147-3157, 2021.
  52. T. Zhu, K. Li, P. Herrero, and P. Georgiou, “Basal glucose control in type 1 diabetes using deep reinforcement learning: An in silico validation”, IEEE J. Biomed. Health Inform., vol. 25, no. 4, pp. 1223-1232, 2021.
  53. T. Zhu, K. Li, and P. Georgiou, “Offline deep reinforcement learning and off-policy evaluation for personalized basal insulin control in type 1 diabetes”, IEEE J. Biomed. Health Inform., vol. 27, no. 10, pp. 5087-5098, 2023.
  54. A. Bertachi, C. M. Ramkissoon, J. Bondia, and J. Behi, “Automated blood glucose control in type 1 diabetes: A review of progress and challenges”, Elsevier Endocr. Diabetes Nutr., vol. 65, no. 3, pp. 172-181, 2018.
  55. P. G. Jacobs, P. Herrero, A. Facchinetti et. al., “Artificial intelligence and machine learning for improving glycemic control in diabetes: best practices, pitfalls and opportunities”, IEEE Rev. Biomed. Eng., pp. 1-19, 2023.
  56. M. Tejedor, A. Z. Woldaregay, and F. Godtliebsen, “Reinforcement learning application in diabetes blood glucose control: A systematic review”, Elsevier Artif. Intell. Med., vol. 104, 2020.
  57. J. A. Torkestani, and E. G. Pisheh, “A learning automata-based blood glucose regulation mechanism in type 2 diabetes”, Contr. Eng. Practice, vol. 26, pp. 151-159, 2014.
  58. D. Shi, E. Dassau, F. J. Doyle III, “Adaptive zone model predictive control of artificial pancreas based on glucose- and velocity-dependent control penalties”, IEEE Trans. Biomed. Eng., vol. 66, no. 4, pp. 1045-1054. 2019.
  59. N. Paoletti, K. S. Liu, H. Chen, S. A. Smolka, and S. Lin, “Data-driven robust control for a closed-loop artificial pancreas”, IEEE/ACM Trans. Comp. Biol. Bioinf., vol. 17, no. 6, pp. 1981-1993, 2020.
  60. P. H. Colmegna, F. D. Bianchi, and R. S. Sanchez-Pena, “Automatic glucose control during meals and exercise in type 1 diabetes: Proof-of-concept in silico tests using a switched LPV approach”. IEEE Contr. Sys. L., vol. 5, no. 5, pp. 1489-1494, 2021.
  61. R. Sanz, P. Garzia, J.-L. Diez, and J. Bondia, “Artificial pancreas system with unannounced meals based on a disturbance observer and feedforward compensation”, IEEE Trans. Contr. Sys. Tech., vol. 29, no. 1, pp. 454-460, 2021.
  62. Y. Batmani, S. Khodakaramzadeh, and P. Moradi, “Automatic artificial pancreas systems using an intelligent multiple-model PID strategy”, IEEE J. Biomed. Health Inform., vol. 26, no. 4, pp. 1708-1717, 2022.
  63. S. Lee, J. Kim, S. W. Park, S.-M. Jin, and S.-M. Park, “Toward a fully automated artificial pancreas system using a bioinspired reinforcement learning design: In silico validation”, IEEE J. Biomed. Health Inform., vol. 25, no. 2, pp. 536-546, 2021.
  64. H. Modares, and F. L. Lewis, “Linear quadratic tracking control of partially-unknown continuous-time systems using reinforcement learning”, IEEE Trans. Autom. Contr., vol. 59, no. 11, pp. 3051-3056, 2014.
  65. H. Modares, and F. L. Lewis, “Optimal tracking control of nonlinear partially-unknown constrained-input systems using integral reinforcement learning”, Automatica, vol. 50, no. 7, pp. 1780-1792, 2014.
  66. B. Kiumarsi, and F. L. Lewis, “Actor-critic-based optimal tracking control for partially-unknown nonlinear discrete-time systems”, IEEE Trans. Neur. Netw. Learn. Syst., vol. 26, no. 1, pp. 140-151, 2015.
  67. Q. Wei, R. Song, B. Li, and X. Lin, “Self-learning optimal control of nonlinear systems: Adaptive dynamic programming approach”, Science Press Beijing and Springer Nature Singapure, 2018.
  68. P. Beuchat, A. Georghiou, and J. Lygeros, “Performance guarantees for model-based approximate dynamic programming in continuous spaces”, in IEEE Transactions on Automatic Control, vol. 65, no. 1, pp. 143-158, 2020.
  69. A. M. Krall, “Applied analysis”, D. Reidel Publishing Company, 1986.
  70. K. Hornik, M. Stinchcombe, and H. White, “Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks”, Neur. Netw., vol. 3, pp. 551–560, 1990.
  71. J. Li, T. Chai, F. L. Lewis, Z. Ding, and Y. Jiang, “Off-policy interleaved Q-learning: Optimal control for affine nonlinear discrete-time systems”, IEEE Trans. Neur. Netw. Neur. Sys., vol. 30, no. 5, pp. 1308-1320, 2019.
  72. J. Li, T. Chai, F. L. Lewis, J. Fan, Z. Ding, and J. Ding, “Off-policy Q-learning: Set-point design for optimizing dual-rate rougher flotation operational processes”, IEEE Trans. Ind. Electr., vol. 65, no. 5, pp. 4092-4102, 2018.
  73. M. Messori, G. P. Incremona, C. Cobelli, and L. Magni, “Individualized model predictive controlfor the artificial pancreas: In silico evaluation of closed-loop glucose control”, IEEE Contr. Sys. Mag., vol. 38, no. 1, pp. 86-104, 2018.
  74. G. Freckmann, “Basics and use of continuous glucose monitoring (CGM) in diabetes therapy”, J. Lab. Med., vol. 44, no. 2, pp. 71-79, 2020.
  75. W. Clarke, and B. Kovatchev, “Statistical tools to analyze continuous glucose monitor data”, Diabetes Tech. Therap., vol. 11, no. 1, pp. 45-54, 2009.
  76. A. Scaramuzza et. al., “Recommendations for self-monitoring in pediatric diabetes: A consensus statement by the ISPED”, Acta Diabetol., vol. 51, pp. 173-184, 2014.
  77. A. Tanzanakis, and J. Lygeros, “Multi-step optimal tracking control of unknown nonzero-sum games based on least squares and linear programming: An application to a fully-automated, dual-hormone artificial pancreas”, arXiv:2311.03063.
  78. F. L. Lewis, S. Jagannathan, and A. Yesildirak, “Neural network control of robot manipulators and nonlinear systems”, First Edition, Taylor and Francis, 1999.
  79. R. Postoyan, L. Buşoniu, D. Nešić, and J. Daafouz, “Stability analysis of discrete-time infinite-horizon optimal control with discounted cost”, IEEE Trans. Autom. Contr., vol. 62, no. 6, pp. 2736-2749, 2017.
  80. R. Cogill, M. Rotkovitz, B. V. Roy, and S. Lall, “An approximate dynamic programming approach to decentralized control of stochastic systems”, Control Uncert. Syst.: Modelling, Approximation, and Design, pp. 243-256, 2006.
  81. M. Abu-Khalaf, and F. L. Lewis, “Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach”, Automatica, vol. 41, no. 5, pp. 779-791, 2005.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Alexandros Tanzanakis (4 papers)
  2. John Lygeros (222 papers)

Summary

We haven't generated a summary for this paper yet.