A PAC Learning Algorithm for LTL and Omega-regular Objectives in MDPs (2310.12248v3)
Abstract: Linear temporal logic (LTL) and omega-regular objectives -- a superset of LTL -- have seen recent use as a way to express non-Markovian objectives in reinforcement learning. We introduce a model-based probably approximately correct (PAC) learning algorithm for omega-regular objectives in Markov decision processes (MDPs). As part of the development of our algorithm, we introduce the epsilon-recurrence time: a measure of the speed at which a policy converges to the satisfaction of the omega-regular objective in the limit. We prove that our algorithm only requires a polynomial number of samples in the relevant parameters, and perform experiments which confirm our theory.
- A framework for transforming specifications in reinforcement learning. In Principles of Systems Design: Essays Dedicated to Thomas A. Henzinger on the Occasion of His 60th Birthday, 604β624. Springer.
- Policy Synthesis andΒ Reinforcement Learning forΒ Discounted LTL. In Computer Aided Verification, 415β435.
- Concrete problems in AI safety. arXiv preprint arXiv:1606.06565.
- PAC statistical model checking for Markov decision processes and stochastic games. In International Conference on Computer Aided Verification, 497β519. Springer.
- Principles of model checking. MIT press.
- Control synthesis from linear temporal logic specifications using model-free reinforcement learning. In 2020 IEEE International Conference on Robotics and Automation (ICRA), 10349β10355. IEEE.
- R-Max - a General Polynomial Time Algorithm for near-Optimal Reinforcement Learning. Journal of Machine Learning Research, 3.
- Verification of Markov decision processes using learning algorithms. In Automated Technology for Verification and Analysis: 12th International Symposium, ATVA 2014, Sydney, NSW, Australia, November 3-7, 2014, Proceedings 12, 98β114. Springer.
- Faster statistical model checking for unbounded temporal properties. ACM Transactions on Computational Logic (TOCL), 18(2): 1β25.
- Spot 2.0 β a framework for LTL and Οπ\omegaitalic_Ο-automata manipulation. In Proceedings of the 14th International Symposium on Automated Technology for Verification and Analysis (ATVAβ16), volume 9938 of Lecture Notes in Computer Science, 122β129. Springer.
- Probably approximately correct MDP learning and control with temporal logic constraints. arXiv preprint arXiv:1404.7073.
- Omega-regular objectives in model-free reinforcement learning. In Tools and Algorithms for the Construction and Analysis of Systems: 25th International Conference, TACAS 2019, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2019, Prague, Czech Republic, April 6β11, 2019, Proceedings, Part I, 395β412. Springer.
- Good-for-MDPs automata for probabilistic analysis and reinforcement learning. In Tools and Algorithms for the Construction and Analysis of Systems: 26th International Conference, TACAS 2020, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2020, Dublin, Ireland, April 25β30, 2020, Proceedings, Part I, 306β323. Springer.
- An Impossibility Result in Automata-Theoretic Reinforcement Learning. In Automated Technology for Verification and Analysis, volume 13505 of Lecture Notes in Computer Science, 42β57.
- Mungojerrie: Linear-Time Objectives in Model-Free Reinforcement Learning. In Tools and Algorithms for the Construction and Analysis of Systems, 527β545.
- Kakade, S.Β M. 2003. On the sample complexity of reinforcement learning. University of London, University College London (United Kingdom).
- Near-optimal reinforcement learning in polynomial time. Machine learning, 49(2): 209β232.
- Environment-independent task specifications via GLTL. arXiv preprint arXiv:1704.04341.
- Human-level control through deep reinforcement learning. nature, 518(7540): 529β533.
- Silver, D.; etΒ al. 2016. Mastering the game of Go with deep neural networks and tree search. Nature, 529: 484β489.
- Reinforcement learning: An introduction. MIT press.
- Valiant, L.Β G. 1984. A theory of the learnable. Communications of the ACM, 27(11): 1134β1142.
- Policy optimization with linear temporal logic constraints. Advances in Neural Information Processing Systems, 35: 17690β17702.
- On the (in) tractability of reinforcement learning for LTL objectives. arXiv preprint arXiv:2111.12679.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.