Koopman-Assisted Reinforcement Learning (2403.02290v1)
Abstract: The BeLLMan equation and its continuous form, the Hamilton-Jacobi-BeLLMan (HJB) equation, are ubiquitous in reinforcement learning (RL) and control theory. However, these equations quickly become intractable for systems with high-dimensional states and nonlinearity. This paper explores the connection between the data-driven Koopman operator and Markov Decision Processes (MDPs), resulting in the development of two new RL algorithms to address these limitations. We leverage Koopman operator techniques to lift a nonlinear system into new coordinates where the dynamics become approximately linear, and where HJB-based methods are more tractable. In particular, the Koopman operator is able to capture the expectation of the time evolution of the value function of a given system via linear dynamics in the lifted coordinates. By parameterizing the Koopman operator with the control actions, we construct a ``Koopman tensor'' that facilitates the estimation of the optimal value function. Then, a transformation of BeLLMan's framework in terms of the Koopman tensor enables us to reformulate two max-entropy RL algorithms: soft value iteration and soft actor-critic (SAC). This highly flexible framework can be used for deterministic or stochastic systems as well as for discrete or continuous-time dynamics. Finally, we show that these Koopman Assisted Reinforcement Learning (KARL) algorithms attain state-of-the-art (SOTA) performance with respect to traditional neural network-based SAC and linear quadratic regulator (LQR) baselines on four controlled dynamical systems: a linear state-space system, the Lorenz system, fluid flow past a cylinder, and a double-well potential with non-isotropic stochastic forcing.
- Reinforcement learning: An introduction, volume 1. MIT press Cambridge, 1998.
- Data-driven science and engineering: Machine learning, dynamical systems, and control. Cambridge University Press, 2022.
- Continuous control with deep reinforcement learning. arxiv:1509.02971, 2015.
- Asynchronous methods for deep reinforcement learning. In ICML, pages 1928–1937. PMLR, 2016.
- Deep reinforcement learning with double q-learning. In Proceedings of the AAAI conference on artificial intelligence, volume 30, 2016.
- Dueling network architectures for deep reinforcement learning. In International conference on machine learning, pages 1995–2003. PMLR, 2016.
- Sudharsan Ravichandiran. Hands-on reinforcement learning with Python: master reinforcement and deep reinforcement learning using OpenAI gym and tensorFlow. Packt Publishing Ltd, 2018.
- Rainbow: Combining improvements in deep reinforcement learning. In Thirty-second AAAI conference on artificial intelligence, 2018.
- Shared autonomy via deep reinforcement learning. arxiv:1802.01744, 2018.
- Human-level control through deep reinforcement learning. Nature, 518(7540):529, 2015.
- Grandmaster level in starcraft ii using multi-agent reinforcement learning. Nature, 575(7782):350–354, 2019.
- Mastering the game of go with deep neural networks and tree search. nature, 529(7587):484–489, 2016.
- Mastering the game of go without human knowledge. nature, 550(7676):354–359, 2017.
- A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science, 362(6419):1140–1144, 2018.
- Deep reinforcement learning for de novo drug design. Science advances, 4(7):eaap7885, 2018.
- Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. In 2017 IEEE international conference on robotics and automation (ICRA), pages 3389–3396. IEEE, 2017.
- Deep reinforcement learning framework for autonomous driving. Electronic Imaging, 2017(19):70–76, 2017.
- Champion-level drone racing using deep reinforcement learning. Nature, 620(7976):982–987, 2023.
- Reinforcement learning and wavelet adapted vortex methods for simulations of self-propelled swimmers. SIAM Journal on Scientific Computing, 36(3):B622–B639, 2014.
- Flow navigation by smart microswimmers via reinforcement learning. Phys. Rev. Lett., 118(15):158004, 2017.
- Efficient collective swimming by harnessing vortices through deep reinforcement learning. Proceedings of the National Academy of Sciences, 115(23):5849–5854, 2018.
- Controlled gliding and perching through deep-reinforcement-learning. Physical Review Fluids, 4(9):093902, 2019.
- Zermelo’s problem: Optimal point-to-point navigation in 2d turbulent flows using reinforcement learning. Chaos, 29(10):103138, 2019.
- Reinforcement learning for bluff body active flow control in experiments and simulations. Proceedings of the National Academy of Sciences, 117(42):26091–26098, 2020.
- Scientific multi-agent reinforcement learning for wall-models of turbulent flows. Nature Communications, 13(1):1443, 2022.
- Magnetic control of tokamak plasmas through deep reinforcement learning. Nature, 602(7897):414–419, 2022.
- Bernard O Koopman. Hamiltonian systems and transformation in hilbert space. Proceedings of the national academy of sciences of the united states of america, 17(5):315, 1931.
- Dynamical systems of continuous spectra. Proceedings of the National Academy of Sciences, 18(3):255–263, 1932.
- Comparison of systems with complex behavior. Physica D: Nonlinear Phenomena, 197(1):101–133, 2004.
- Igor Mezić. Spectral properties of dynamical systems, model reduction and decompositions. Nonlinear Dynamics, 41(1-3):309–325, 2005.
- Applied Koopmanism a). Chaos: An Interdisciplinary Journal of Nonlinear Science, 22(4):047510, 2012.
- Igor Mezic. Analysis of fluid flows via spectral properties of the Koopman operator. Annual Review of Fluid Mechanics, 45:357–378, 2013.
- Modern Koopman theory for dynamical systems. SIAM Review, 64(2):229–340, 2022.
- Spectral analysis of nonlinear flows. J. Fluid Mech., 645:115–127, 2009.
- Peter J Schmid. Dynamic mode decomposition of numerical and experimental data. Journal of fluid mechanics, 656:5–28, 2010.
- On dynamic mode decomposition: theory and applications. Journal of Computational Dynamics, 1(2):391–421, 2014.
- Dynamic Mode Decomposition: Data-Driven Modeling of Complex Systems. SIAM, 2016.
- Variable projection methods for an optimized dynamic mode decomposition. SIAM Journal on Applied Dynamical Systems, 17(1):380–416, 2018.
- Consistent dynamic mode decomposition. SIAM Journal on Applied Dynamical Systems, 18(3):1565–1585, 2019.
- A data-driven approximation of the Koopman operator: extending dynamic mode decomposition. Journal of Nonlinear Science, 6:1307–1346, 2015.
- A kernel approach to data-driven Koopman spectral analysis. Journal of Computational Dynamics, 2(2):247–265, 2015.
- Extended dynamic mode decomposition with learned Koopman eigenfunctions for prediction and control. In 2020 american control conference (acc), pages 3906–3913. IEEE, 2020.
- Matthew J Colbrook. The mpedmd algorithm for data-driven computations of measure-preserving dynamical systems. SIAM Journal on Numerical Analysis, 61(3):1585–1608, 2023.
- Beyond expectations: residual dynamic mode decomposition and variance for stochastic dynamical systems. Nonlinear Dynamics, pages 1–25, 2023.
- Residual dynamic mode decomposition: robust and verified koopmanism. Journal of Fluid Mechanics, 955:A21, 2023.
- Rigorous data-driven computation of spectral properties of koopman operators for dynamical systems. Communications on Pure and Applied Mathematics, 77(1):221–283, 2024.
- Data-driven model reduction and transfer operator approximation. Journal of Nonlinear Science, 28:985–1010, 2018.
- Data-driven approximation of the Koopman generator: Model reduction, system identification, and control. Physica D: Nonlinear Phenomena, 406:132416, 2020.
- Dynamic mode decomposition with control. SIAM Journal on Applied Dynamical Systems, 15(1):142–161, 2016.
- Generalizing Koopman theory to allow for inputs and control. SIAM Journal on Applied Dynamical Systems, 17(1):909–930, 2018.
- Data-driven discovery of Koopman eigenfunctions for control. Machine Learning: Science and Technology, 2(3):035023, 2021.
- Provably efficient maximum entropy exploration. In ICML, pages 2681–2691. PMLR, 2019.
- Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International conference on machine learning, pages 1861–1870. PMLR, 2018.
- Soft actor-critic algorithms and applications. arxiv:1812.05905, 2018.
- Koopman operator–based knowledge-guided reinforcement learning for safe human–robot interaction. Frontiers in Robotics and AI, 9:779194, 2022.
- Koopman Q-learning: Offline reinforcement learning via symmetries of dynamics. In International Conference on Machine Learning, pages 23645–23667. PMLR, 2022.
- Koopman constrained policy optimization: A Koopman operator theoretic method for differentiable optimal control in robotics. In ICML 2023 Workshop on Differentiable Almost Everything: Differentiable Relaxations, Algorithms, Operators, and Simulators, 2023.
- Deep learning for universal linear embeddings of nonlinear dynamics. Nature communications, 9(1):4950, 2018.
- Deep dynamical modeling and control of unsteady fluid flows. Advances in Neural Information Processing Systems, 31, 2018.
- Linearly recurrent autoencoder networks for learning dynamics. SIAM Journal on Applied Dynamical Systems, 18(1):558–593, 2019.
- Learning Koopman invariant subspaces for dynamic mode decomposition. In Advances in Neural Information Processing Systems, pages 1130–1140, 2017.
- Learning deep neural network representations for koopman operators of nonlinear dynamical systems. In 2019 American Control Conference (ACC), pages 4832–4839. IEEE, 2019.
- VAMPnets: Deep learning of molecular kinetics. Nature Communications, 9(5), 2018.
- Sparse identification of nonlinear dynamics with control (sindyc). IFAC-PapersOnLine, 49(18):710–715, 2016.
- Sparse identification of nonlinear dynamics for model predictive control in the low-data limit. Proceedings of the Royal Society of London A, 474(2219), 2018.
- Linear predictors for nonlinear dynamical systems: Koopman operator meets model predictive control. Automatica, 93:149–160, 2018.
- A data-driven Koopman model predictive control framework for nonlinear partial differential equations. In 2018 IEEE Conference on Decision and Control (CDC), pages 6409–6414. IEEE, 2018.
- Optimal construction of Koopman eigenfunctions for prediction and control. IEEE Transactions on Automatic Control, 65(12):5114–5129, 2020.
- Koopman operator-based model predictive control with recursive online update. In 2021 European Control Conference (ECC), pages 1543–1549. IEEE, 2021.
- Data-Driven Science and Engineering: Machine Learning, Dynamical Systems, and Control. Cambridge University Press, 2nd edition, 2022.
- Koopman operator in systems and control. Springer, 2020.
- Geometry of the ergodic quotient reveals coherent structures in flows. Physica D: Nonlinear Phenomena, 241(15):1255–1269, 2012.
- Linearization in the large of nonlinear systems and Koopman operator spectrum. Physica D: Nonlinear Phenomena, 242(1):42–53, 2013.
- Koopman invariant subspaces and finite linear representations of nonlinear dynamical systems for control. PLoS ONE, 11(2):e0150171, 2016.
- On convergence of extended dynamic mode decomposition to the Koopman operator. Journal of Nonlinear Science, 28(2):687–710, 2018.
- Deep learning markov and Koopman models with physical constraints. In Mathematical and Scientific Machine Learning, pages 451–475. PMLR, 2020.
- Chaos as an intermittently forced linear system. Nature Communications, 8(19):1–9, 2017.
- Structured time-delay models for dynamical systems with connections to frenet–serret frame. Proceedings of the Royal Society A, 477(2254):20210097, 2021.
- Extracting reproducible time-resolved resting state networks using dynamic mode decomposition. Frontiers in computational neuroscience, page 75, 2019.
- Centering data improves the dynamic mode decomposition. SIAM Journal on Applied Dynamical Systems, 19(3):1920–1955, 2020.
- Data-driven resolvent analysis. Journal of Fluid Mechanics, 918, 2021.
- Physics-informed dynamic mode decomposition. Proceedings of the Royal Society A, 479(2271):20220576, 2023.
- Data-driven spatiotemporal modal decomposition for time frequency analysis. Applied and Computational Harmonic Analysis, 49(3):771–790, 2020.
- De-biasing the dynamic mode decomposition for applied Koopman spectral analysis. Theoretical and Computational Fluid Dynamics, 31(4):349–368, 2017.
- Characterizing and correcting for the effect of sensor noise in the dynamic mode decomposition. Experiments in Fluids, 57(3):1–19, 2016.
- Modal analysis of fluid flows: An overview. AIAA Journal, 55(12):4013–4041, 2017.
- Extracting spatial-temporal coherent patterns in large-scale neural recordings using dynamic mode decomposition. Journal of Neuroscience Methods, 258:1–15, 2016.
- A variational approach to modeling slow processes in stochastic dynamical systems. Multiscale Modeling Simulation, 11(2):635–655, 2013.
- Variational approach to molecular kinetics. Journal of chemical theory and computation, 10(4):1739–1752, 2014.
- Variational tensor approach for approximating the rare-event kinetics of macromolecular systems. The Journal of chemical physics, 144(5):054105, 2016.
- Variational Koopman models: slow collective variables and molecular kinetics from short off-equilibrium simulations. The Journal of chemical physics, 146(15):154104, 2017.
- Reinforcement Learning: Theory and Algorithms. GitHub, 2022.
- Why should i trust you, bellman? the bellman error is a poor replacement for value error. arxiv:2201.12417, 2022.
- Cleanrl: High-quality single-file implementations of deep reinforcement learning algorithms. The Journal of Machine Learning Research, 23(1):12585–12602, 2022.
- A hierarchy of low-dimensional models for the transient and post-transient cylinder wake. Journal of Fluid Mechanics, 497:335–363, 2003.
- Online learning of dynamical systems: An operator theoretic approach. arxiv:1909.12520, 2019.
- A knowledge-gradient policy for sequential information collection. SIAM J. Control and Optimization, 47(5):2410–2439, 2008.
- Discovering governing equations from data by sparse identification of nonlinear dynamical systems. Proceedings of the national academy of sciences, 113(15):3932–3937, 2016.