FP-IRL: Fokker-Planck-based Inverse Reinforcement Learning -- A Physics-Constrained Approach to Markov Decision Processes (2306.10407v1)
Abstract: Inverse Reinforcement Learning (IRL) is a compelling technique for revealing the rationale underlying the behavior of autonomous agents. IRL seeks to estimate the unknown reward function of a Markov decision process (MDP) from observed agent trajectories. However, IRL needs a transition function, and most algorithms assume it is known or can be estimated in advance from data. It therefore becomes even more challenging when such transition dynamics is not known a-priori, since it enters the estimation of the policy in addition to determining the system's evolution. When the dynamics of these agents in the state-action space is described by stochastic differential equations (SDE) in It{o} calculus, these transitions can be inferred from the mean-field theory described by the Fokker-Planck (FP) equation. We conjecture there exists an isomorphism between the time-discrete FP and MDP that extends beyond the minimization of free energy (in FP) and maximization of the reward (in MDP). We identify specific manifestations of this isomorphism and use them to create a novel physics-aware IRL algorithm, FP-IRL, which can simultaneously infer the transition and reward functions using only observed trajectories. We employ variational system identification to infer the potential function in FP, which consequently allows the evaluation of reward, transition, and policy by leveraging the conjecture. We demonstrate the effectiveness of FP-IRL by applying it to a synthetic benchmark and a biological problem of cancer cell dynamics, where the transition function is inaccessible.
- P. Abbeel and A. Y. Ng. Apprenticeship learning via inverse reinforcement learning. In Proceedings of the Twenty-First International Conference on Machine Learning, ICML ’04, page 1, New York, NY, USA, 2004. Association for Computing Machinery.
- R. Bellman. On the theory of dynamic programming. Proceedings of the national Academy of Sciences, 38(8):716–719, 1952.
- Taming the noise in reinforcement learning via soft updates. In Proceedings of the Thirty-Second Conference on Uncertainty in Artificial Intelligence, UAI’16, page 202–211, Arlington, Virginia, USA, 2016. AUAI Press.
- K. Friston. The free-energy principle: a rough guide to the brain? Trends in cognitive sciences, 13(7):293–301, 2009.
- K. Friston. The free-energy principle: a unified brain theory? Nature reviews neuroscience, 11(2):127–138, 2010.
- Active inference or reinforcement learning. PLoS One, 4(7):e6421, 2009.
- A free energy principle for the brain. Journal of Physiology-Paris, 100(1):70–87, 2006. Theoretical and Computational Neuroscience: Understanding Brain Functions.
- Learning robust rewards with adverserial inverse reinforcement learning. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, 2018.
- Iq-learn: Inverse soft-q learning for imitation. Advances in Neural Information Processing Systems, 34:4028–4039, 2021.
- Generative adversarial nets. In Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, and K. Weinberger, editors, Advances in Neural Information Processing Systems, volume 27. Curran Associates, Inc., 2014.
- Reinforcement learning with deep energy-based policies. In D. Precup and Y. W. Teh, editors, Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pages 1352–1361. PMLR, 06–11 Aug 2017.
- Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In J. Dy and A. Krause, editors, Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 1861–1870. PMLR, 10–15 Jul 2018.
- Soft actor-critic algorithms and applications. arXiv preprint arXiv:1812.05905, 2018.
- Optiongan: Learning joint reward-policy options using generative adversarial inverse reinforcement learning. Proceedings of the AAAI Conference on Artificial Intelligence, 32(1), Apr. 2018.
- Inverse reinforcement learning with simultaneous estimation of rewards and dynamics. In A. Gretton and C. C. Robert, editors, Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, volume 51 of Proceedings of Machine Learning Research, pages 102–110, Cadiz, Spain, 09–11 May 2016. PMLR.
- J. Ho and S. Ermon. Generative adversarial imitation learning. In D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 29. Curran Associates, Inc., 2016.
- Cell-to-cell variability of dynamic cxcl12-cxcr4 signaling and morphological processes in chemotaxis. bioRxiv, 2022.
- A bayesian approach for quantifying data scarcity when modeling human behavior via inverse reinforcement learning. ACM Trans. Comput.-Hum. Interact., jul 2022. Just Accepted.
- Free energy and the fokker-planck equation. Physica D: Nonlinear Phenomena, 107(2):265–271, 1997. 16th Annual International Conference of the Center for Nonlinear Studies.
- The unreasonable effectiveness of inverse reinforcement learning in advancing cancer research. Proceedings of the AAAI Conference on Artificial Intelligence, 34(01):437–445, Apr. 2020.
- Algorithms for inverse reinforcement learning. In Proceedings of the Seventeenth International Conference on Machine Learning, ICML ’00, page 663–670, San Francisco, CA, USA, 2000. Morgan Kaufmann Publishers Inc.
- D. Ramachandran and E. Amir. Bayesian inverse reinforcement learning. In Proceedings of the 20th International Joint Conference on Artifical Intelligence, IJCAI’07, page 2586–2591, San Francisco, CA, USA, 2007. Morgan Kaufmann Publishers Inc.
- Maximum margin planning. In Proceedings of the 23rd International Conference on Machine Learning, ICML ’06, page 729–736, New York, NY, USA, 2006. Association for Computing Machinery.
- H. Risken and T. Frank. The Fokker-Planck Equation: Methods of Solution and Applications. Springer Series in Synergetics. Springer Berlin Heidelberg, 1996.
- S. Russell. Learning agents for uncertain environments. In Proceedings of the eleventh annual conference on Computational learning theory, pages 101–103, 1998.
- B. Sallans and G. E. Hinton. Reinforcement learning with factored states and actions. The Journal of Machine Learning Research, 5:1063–1088, 2004.
- Variational system identification of the partial differential equations governing the physics of pattern-formation: Inference under varying fidelity and noise. Computer Methods in Applied Mechanics and Engineering, 356:44–74, 2019.
- Variational system identification of the partial differential equations governing microstructure evolution in materials: Inference over sparse and spatially unrelated data. Computer Methods in Applied Mechanics and Engineering, 377:113706, 2021.
- Multi-agent adversarial inverse reinforcement learning. In K. Chaudhuri and R. Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pages 7194–7201. PMLR, 09–15 Jun 2019.
- B. D. Ziebart. Modeling Purposeful Adaptive Behavior with the Principle of Maximum Causal Entropy. PhD thesis, Carnegie Mellon University, USA, 2010. AAI3438449.
- Modeling interaction via the principle of maximum causal entropy. In Proceedings of the 27th International Conference on International Conference on Machine Learning, ICML’10, page 1255–1262, Madison, WI, USA, 2010. Omnipress.
- Maximum entropy inverse reinforcement learning. In Proceedings of the 23rd National Conference on Artificial Intelligence - Volume 3, AAAI’08, page 1433–1438. AAAI Press, 2008.