End-to-End Reinforcement Learning of Koopman Models for Economic Nonlinear Model Predictive Control (2308.01674v4)
Abstract: (Economic) nonlinear model predictive control ((e)NMPC) requires dynamic models that are sufficiently accurate and computationally tractable. Data-driven surrogate models for mechanistic models can reduce the computational burden of (e)NMPC; however, such models are typically trained by system identification for maximum prediction accuracy on simulation samples and perform suboptimally in (e)NMPC. We present a method for end-to-end reinforcement learning of Koopman surrogate models for optimal performance as part of (e)NMPC. We apply our method to two applications derived from an established nonlinear continuous stirred-tank reactor model. The controller performance is compared to that of (e)NMPCs utilizing models trained using system identification, and model-free neural network controllers trained using reinforcement learning. We show that the end-to-end trained models outperform those trained using system identification in (e)NMPC, and that, in contrast to the neural network controllers, the (e)NMPC controllers can react to changes in the control setting without retraining.
- TensorFlow: a system for large-scale machine learning. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), pages 265–283.
- Differentiable convex optimization layers. Advances in Neural Information Processing Systems, 32:9558–9570.
- Differentiating through a cone program. arXiv preprint arXiv:1904.09043.
- Differentiable MPC for end-to-end planning and control. Advances in Neural Information Processing Systems, 31:8299–8310.
- Optnet: Differentiable optimization as a layer in neural networks. In International Conference on Machine Learning, pages 136–145.
- On average performance and stability of economic model predictive control. IEEE Transactions on Automatic Control, 57(7):1615–1626.
- Gekko optimization suite. Processes, 6(8):106.
- Scheduled sampling for sequence prediction with recurrent neural networks. Advances in Neural Information Processing Systems, 28:1171–1179.
- Convex Optimization. Cambridge University Press.
- Data-driven Science and Engineering: Machine Learning, Dynamical Systems, and Control. Cambridge University Press.
- Gnu-RL: A precocial reinforcement learning solution for building HVAC control using a differentiable MPC policy. In Proceedings of the 6th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation, pages 316–325.
- Domke, J. (2012). Generic methods for optimization-based modeling. In Artificial Intelligence and Statistics, pages 318–326.
- A family of embedded Runge-Kutta formulae. Journal of Computational and Applied Mathematics, 6(1):19–26.
- A time scale-bridging approach for integrating production scheduling and process control. Computers & Chemical Engineering, 79:59–69.
- Implementation matters in deep policy gradients: A case study on PPO and TRPO. arXiv preprint arXiv:2005.12729.
- Sensitivity and stability analysis for nonlinear programming. Annals of Operations Research, 27(1):215–235.
- Sensitivity analysis in nonlinear programming under second order assumptions. In Systems and Optimization, pages 74–97. Springer.
- Simultaneous cyclic scheduling and control of a multiproduct CSTR. Industrial & Engineering Chemistry Research, 45(20):6698–6712.
- Addressing function approximation error in actor-critic methods. In International Conference on Machine Learning, pages 1587–1596.
- Deep Learning. MIT Press.
- Data-driven economic NMPC using reinforcement learning. IEEE Transactions on Automatic Control, 65(2):636–648.
- Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International Conference on Machine Learning, pages 1861–1870.
- Imitation learning: A survey of learning methods. ACM Computing Surveys (CSUR), 50(2):1–35.
- Data-driven end-to-end learning of pole placement control for nonlinear dynamics via Koopman invariant subspaces. arXiv preprint arXiv:2208.08883.
- Kalman, R. E. et al. (1960). Contributions to the theory of optimal control. Boletín de la Sociedad Matemática Mexicana, 5(2):102–119.
- Karush, W. (1939). Minima of functions of several variables with inequalities as side constraints. M. Sc. Dissertation. Department of Mathematics, University of Chicago.
- Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
- Koopman, B. O. (1931). Hamiltonian systems and transformation in Hilbert space. Proceedings of the National Academy of Sciences, 17(5):315–318.
- Linear predictors for nonlinear dynamical systems: Koopman operator meets model predictive control. Automatica, 93:149–160.
- Reinforcement learning and adaptive dynamic programming for feedback control. IEEE Circuits and Systems Magazine, 9(3):32–50.
- Reinforcement learning and feedback control: Using natural decision methods to design optimal adaptive controllers. IEEE Control Systems Magazine, 32(6):76–105.
- Deep learning for universal linear embeddings of nonlinear dynamics. Nature Communications, 9(1):1–10.
- Overview of surrogate modeling in chemical process engineering. Chemie Ingenieur Technik, 91(3):228–239.
- Playing Atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602.
- Numerical Optimization. Springer.
- Open Power System Data (2020). Open power system data. https://data.open-power-system-data.org/time_series/ (accessed on 2022-08-29).
- PyTorch: An imperative style, high-performance deep learning library. Advances in Neural Information Processing Systems, 32:8024–8035.
- Multi-Parametric Optimization and Control. John Wiley & Sons.
- Generalizing Koopman theory to allow for inputs and control. SIAM Journal on Applied Dynamical Systems, 17(1):909–930.
- Dataset Shift in Machine Learning. MIT Press.
- Directional derivatives of the solution of a parametric nonlinear program. Mathematical Programming, 70(1):159–172.
- Trust region policy optimization. In International Conference on Machine Learning, pages 1889–1897.
- High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:1506.02438.
- Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.
- Deterministic policy gradient algorithms. In International Conference on Machine Learning, pages 387–395.
- Reinforcement Learning: An Introduction. MIT press.
- SciPy 1.0: fundamental algorithms for scientific computing in Python. Nature Methods, 17(3):261–272.
- Extending data-driven Koopman analysis to actuated systems. IFAC-PapersOnLine, 49(18):704–709.
- Wild-time: A benchmark of in-the-wild distribution shift over time. Advances in Neural Information Processing Systems, 35:10309–10324.
- Embedding Koopman optimal control in robot policy learning. In 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 13392–13399. IEEE.
- Learning to execute. arXiv preprint arXiv:1410.4615.
- Adaptive risk minimization: Learning to adapt to domain shift. Advances in Neural Information Processing Systems, 34:23664–23678.
- Daniel Mayfrank (3 papers)
- Alexander Mitsos (45 papers)
- Manuel Dahmen (22 papers)