Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
129 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Koopman-Assisted Reinforcement Learning (2403.02290v1)

Published 4 Mar 2024 in cs.AI, cs.LG, math.DS, and math.OC

Abstract: The BeLLMan equation and its continuous form, the Hamilton-Jacobi-BeLLMan (HJB) equation, are ubiquitous in reinforcement learning (RL) and control theory. However, these equations quickly become intractable for systems with high-dimensional states and nonlinearity. This paper explores the connection between the data-driven Koopman operator and Markov Decision Processes (MDPs), resulting in the development of two new RL algorithms to address these limitations. We leverage Koopman operator techniques to lift a nonlinear system into new coordinates where the dynamics become approximately linear, and where HJB-based methods are more tractable. In particular, the Koopman operator is able to capture the expectation of the time evolution of the value function of a given system via linear dynamics in the lifted coordinates. By parameterizing the Koopman operator with the control actions, we construct a ``Koopman tensor'' that facilitates the estimation of the optimal value function. Then, a transformation of BeLLMan's framework in terms of the Koopman tensor enables us to reformulate two max-entropy RL algorithms: soft value iteration and soft actor-critic (SAC). This highly flexible framework can be used for deterministic or stochastic systems as well as for discrete or continuous-time dynamics. Finally, we show that these Koopman Assisted Reinforcement Learning (KARL) algorithms attain state-of-the-art (SOTA) performance with respect to traditional neural network-based SAC and linear quadratic regulator (LQR) baselines on four controlled dynamical systems: a linear state-space system, the Lorenz system, fluid flow past a cylinder, and a double-well potential with non-isotropic stochastic forcing.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (98)
  1. Reinforcement learning: An introduction, volume 1. MIT press Cambridge, 1998.
  2. Data-driven science and engineering: Machine learning, dynamical systems, and control. Cambridge University Press, 2022.
  3. Continuous control with deep reinforcement learning. arxiv:1509.02971, 2015.
  4. Asynchronous methods for deep reinforcement learning. In ICML, pages 1928–1937. PMLR, 2016.
  5. Deep reinforcement learning with double q-learning. In Proceedings of the AAAI conference on artificial intelligence, volume 30, 2016.
  6. Dueling network architectures for deep reinforcement learning. In International conference on machine learning, pages 1995–2003. PMLR, 2016.
  7. Sudharsan Ravichandiran. Hands-on reinforcement learning with Python: master reinforcement and deep reinforcement learning using OpenAI gym and tensorFlow. Packt Publishing Ltd, 2018.
  8. Rainbow: Combining improvements in deep reinforcement learning. In Thirty-second AAAI conference on artificial intelligence, 2018.
  9. Shared autonomy via deep reinforcement learning. arxiv:1802.01744, 2018.
  10. Human-level control through deep reinforcement learning. Nature, 518(7540):529, 2015.
  11. Grandmaster level in starcraft ii using multi-agent reinforcement learning. Nature, 575(7782):350–354, 2019.
  12. Mastering the game of go with deep neural networks and tree search. nature, 529(7587):484–489, 2016.
  13. Mastering the game of go without human knowledge. nature, 550(7676):354–359, 2017.
  14. A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science, 362(6419):1140–1144, 2018.
  15. Deep reinforcement learning for de novo drug design. Science advances, 4(7):eaap7885, 2018.
  16. Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. In 2017 IEEE international conference on robotics and automation (ICRA), pages 3389–3396. IEEE, 2017.
  17. Deep reinforcement learning framework for autonomous driving. Electronic Imaging, 2017(19):70–76, 2017.
  18. Champion-level drone racing using deep reinforcement learning. Nature, 620(7976):982–987, 2023.
  19. Reinforcement learning and wavelet adapted vortex methods for simulations of self-propelled swimmers. SIAM Journal on Scientific Computing, 36(3):B622–B639, 2014.
  20. Flow navigation by smart microswimmers via reinforcement learning. Phys. Rev. Lett., 118(15):158004, 2017.
  21. Efficient collective swimming by harnessing vortices through deep reinforcement learning. Proceedings of the National Academy of Sciences, 115(23):5849–5854, 2018.
  22. Controlled gliding and perching through deep-reinforcement-learning. Physical Review Fluids, 4(9):093902, 2019.
  23. Zermelo’s problem: Optimal point-to-point navigation in 2d turbulent flows using reinforcement learning. Chaos, 29(10):103138, 2019.
  24. Reinforcement learning for bluff body active flow control in experiments and simulations. Proceedings of the National Academy of Sciences, 117(42):26091–26098, 2020.
  25. Scientific multi-agent reinforcement learning for wall-models of turbulent flows. Nature Communications, 13(1):1443, 2022.
  26. Magnetic control of tokamak plasmas through deep reinforcement learning. Nature, 602(7897):414–419, 2022.
  27. Bernard O Koopman. Hamiltonian systems and transformation in hilbert space. Proceedings of the national academy of sciences of the united states of america, 17(5):315, 1931.
  28. Dynamical systems of continuous spectra. Proceedings of the National Academy of Sciences, 18(3):255–263, 1932.
  29. Comparison of systems with complex behavior. Physica D: Nonlinear Phenomena, 197(1):101–133, 2004.
  30. Igor Mezić. Spectral properties of dynamical systems, model reduction and decompositions. Nonlinear Dynamics, 41(1-3):309–325, 2005.
  31. Applied Koopmanism a). Chaos: An Interdisciplinary Journal of Nonlinear Science, 22(4):047510, 2012.
  32. Igor Mezic. Analysis of fluid flows via spectral properties of the Koopman operator. Annual Review of Fluid Mechanics, 45:357–378, 2013.
  33. Modern Koopman theory for dynamical systems. SIAM Review, 64(2):229–340, 2022.
  34. Spectral analysis of nonlinear flows. J. Fluid Mech., 645:115–127, 2009.
  35. Peter J Schmid. Dynamic mode decomposition of numerical and experimental data. Journal of fluid mechanics, 656:5–28, 2010.
  36. On dynamic mode decomposition: theory and applications. Journal of Computational Dynamics, 1(2):391–421, 2014.
  37. Dynamic Mode Decomposition: Data-Driven Modeling of Complex Systems. SIAM, 2016.
  38. Variable projection methods for an optimized dynamic mode decomposition. SIAM Journal on Applied Dynamical Systems, 17(1):380–416, 2018.
  39. Consistent dynamic mode decomposition. SIAM Journal on Applied Dynamical Systems, 18(3):1565–1585, 2019.
  40. A data-driven approximation of the Koopman operator: extending dynamic mode decomposition. Journal of Nonlinear Science, 6:1307–1346, 2015.
  41. A kernel approach to data-driven Koopman spectral analysis. Journal of Computational Dynamics, 2(2):247–265, 2015.
  42. Extended dynamic mode decomposition with learned Koopman eigenfunctions for prediction and control. In 2020 american control conference (acc), pages 3906–3913. IEEE, 2020.
  43. Matthew J Colbrook. The mpedmd algorithm for data-driven computations of measure-preserving dynamical systems. SIAM Journal on Numerical Analysis, 61(3):1585–1608, 2023.
  44. Beyond expectations: residual dynamic mode decomposition and variance for stochastic dynamical systems. Nonlinear Dynamics, pages 1–25, 2023.
  45. Residual dynamic mode decomposition: robust and verified koopmanism. Journal of Fluid Mechanics, 955:A21, 2023.
  46. Rigorous data-driven computation of spectral properties of koopman operators for dynamical systems. Communications on Pure and Applied Mathematics, 77(1):221–283, 2024.
  47. Data-driven model reduction and transfer operator approximation. Journal of Nonlinear Science, 28:985–1010, 2018.
  48. Data-driven approximation of the Koopman generator: Model reduction, system identification, and control. Physica D: Nonlinear Phenomena, 406:132416, 2020.
  49. Dynamic mode decomposition with control. SIAM Journal on Applied Dynamical Systems, 15(1):142–161, 2016.
  50. Generalizing Koopman theory to allow for inputs and control. SIAM Journal on Applied Dynamical Systems, 17(1):909–930, 2018.
  51. Data-driven discovery of Koopman eigenfunctions for control. Machine Learning: Science and Technology, 2(3):035023, 2021.
  52. Provably efficient maximum entropy exploration. In ICML, pages 2681–2691. PMLR, 2019.
  53. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International conference on machine learning, pages 1861–1870. PMLR, 2018.
  54. Soft actor-critic algorithms and applications. arxiv:1812.05905, 2018.
  55. Koopman operator–based knowledge-guided reinforcement learning for safe human–robot interaction. Frontiers in Robotics and AI, 9:779194, 2022.
  56. Koopman Q-learning: Offline reinforcement learning via symmetries of dynamics. In International Conference on Machine Learning, pages 23645–23667. PMLR, 2022.
  57. Koopman constrained policy optimization: A Koopman operator theoretic method for differentiable optimal control in robotics. In ICML 2023 Workshop on Differentiable Almost Everything: Differentiable Relaxations, Algorithms, Operators, and Simulators, 2023.
  58. Deep learning for universal linear embeddings of nonlinear dynamics. Nature communications, 9(1):4950, 2018.
  59. Deep dynamical modeling and control of unsteady fluid flows. Advances in Neural Information Processing Systems, 31, 2018.
  60. Linearly recurrent autoencoder networks for learning dynamics. SIAM Journal on Applied Dynamical Systems, 18(1):558–593, 2019.
  61. Learning Koopman invariant subspaces for dynamic mode decomposition. In Advances in Neural Information Processing Systems, pages 1130–1140, 2017.
  62. Learning deep neural network representations for koopman operators of nonlinear dynamical systems. In 2019 American Control Conference (ACC), pages 4832–4839. IEEE, 2019.
  63. VAMPnets: Deep learning of molecular kinetics. Nature Communications, 9(5), 2018.
  64. Sparse identification of nonlinear dynamics with control (sindyc). IFAC-PapersOnLine, 49(18):710–715, 2016.
  65. Sparse identification of nonlinear dynamics for model predictive control in the low-data limit. Proceedings of the Royal Society of London A, 474(2219), 2018.
  66. Linear predictors for nonlinear dynamical systems: Koopman operator meets model predictive control. Automatica, 93:149–160, 2018.
  67. A data-driven Koopman model predictive control framework for nonlinear partial differential equations. In 2018 IEEE Conference on Decision and Control (CDC), pages 6409–6414. IEEE, 2018.
  68. Optimal construction of Koopman eigenfunctions for prediction and control. IEEE Transactions on Automatic Control, 65(12):5114–5129, 2020.
  69. Koopman operator-based model predictive control with recursive online update. In 2021 European Control Conference (ECC), pages 1543–1549. IEEE, 2021.
  70. Data-Driven Science and Engineering: Machine Learning, Dynamical Systems, and Control. Cambridge University Press, 2nd edition, 2022.
  71. Koopman operator in systems and control. Springer, 2020.
  72. Geometry of the ergodic quotient reveals coherent structures in flows. Physica D: Nonlinear Phenomena, 241(15):1255–1269, 2012.
  73. Linearization in the large of nonlinear systems and Koopman operator spectrum. Physica D: Nonlinear Phenomena, 242(1):42–53, 2013.
  74. Koopman invariant subspaces and finite linear representations of nonlinear dynamical systems for control. PLoS ONE, 11(2):e0150171, 2016.
  75. On convergence of extended dynamic mode decomposition to the Koopman operator. Journal of Nonlinear Science, 28(2):687–710, 2018.
  76. Deep learning markov and Koopman models with physical constraints. In Mathematical and Scientific Machine Learning, pages 451–475. PMLR, 2020.
  77. Chaos as an intermittently forced linear system. Nature Communications, 8(19):1–9, 2017.
  78. Structured time-delay models for dynamical systems with connections to frenet–serret frame. Proceedings of the Royal Society A, 477(2254):20210097, 2021.
  79. Extracting reproducible time-resolved resting state networks using dynamic mode decomposition. Frontiers in computational neuroscience, page 75, 2019.
  80. Centering data improves the dynamic mode decomposition. SIAM Journal on Applied Dynamical Systems, 19(3):1920–1955, 2020.
  81. Data-driven resolvent analysis. Journal of Fluid Mechanics, 918, 2021.
  82. Physics-informed dynamic mode decomposition. Proceedings of the Royal Society A, 479(2271):20220576, 2023.
  83. Data-driven spatiotemporal modal decomposition for time frequency analysis. Applied and Computational Harmonic Analysis, 49(3):771–790, 2020.
  84. De-biasing the dynamic mode decomposition for applied Koopman spectral analysis. Theoretical and Computational Fluid Dynamics, 31(4):349–368, 2017.
  85. Characterizing and correcting for the effect of sensor noise in the dynamic mode decomposition. Experiments in Fluids, 57(3):1–19, 2016.
  86. Modal analysis of fluid flows: An overview. AIAA Journal, 55(12):4013–4041, 2017.
  87. Extracting spatial-temporal coherent patterns in large-scale neural recordings using dynamic mode decomposition. Journal of Neuroscience Methods, 258:1–15, 2016.
  88. A variational approach to modeling slow processes in stochastic dynamical systems. Multiscale Modeling  Simulation, 11(2):635–655, 2013.
  89. Variational approach to molecular kinetics. Journal of chemical theory and computation, 10(4):1739–1752, 2014.
  90. Variational tensor approach for approximating the rare-event kinetics of macromolecular systems. The Journal of chemical physics, 144(5):054105, 2016.
  91. Variational Koopman models: slow collective variables and molecular kinetics from short off-equilibrium simulations. The Journal of chemical physics, 146(15):154104, 2017.
  92. Reinforcement Learning: Theory and Algorithms. GitHub, 2022.
  93. Why should i trust you, bellman? the bellman error is a poor replacement for value error. arxiv:2201.12417, 2022.
  94. Cleanrl: High-quality single-file implementations of deep reinforcement learning algorithms. The Journal of Machine Learning Research, 23(1):12585–12602, 2022.
  95. A hierarchy of low-dimensional models for the transient and post-transient cylinder wake. Journal of Fluid Mechanics, 497:335–363, 2003.
  96. Online learning of dynamical systems: An operator theoretic approach. arxiv:1909.12520, 2019.
  97. A knowledge-gradient policy for sequential information collection. SIAM J. Control and Optimization, 47(5):2410–2439, 2008.
  98. Discovering governing equations from data by sparse identification of nonlinear dynamical systems. Proceedings of the national academy of sciences, 113(15):3932–3937, 2016.
Citations (4)

Summary

We haven't generated a summary for this paper yet.