Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
129 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A PAC Learning Algorithm for LTL and Omega-regular Objectives in MDPs (2310.12248v3)

Published 18 Oct 2023 in cs.LG and cs.LO

Abstract: Linear temporal logic (LTL) and omega-regular objectives -- a superset of LTL -- have seen recent use as a way to express non-Markovian objectives in reinforcement learning. We introduce a model-based probably approximately correct (PAC) learning algorithm for omega-regular objectives in Markov decision processes (MDPs). As part of the development of our algorithm, we introduce the epsilon-recurrence time: a measure of the speed at which a policy converges to the satisfaction of the omega-regular objective in the limit. We prove that our algorithm only requires a polynomial number of samples in the relevant parameters, and perform experiments which confirm our theory.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (24)
  1. A framework for transforming specifications in reinforcement learning. In Principles of Systems Design: Essays Dedicated to Thomas A. Henzinger on the Occasion of His 60th Birthday, 604–624. Springer.
  2. Policy Synthesis and Reinforcement Learning for Discounted LTL. In Computer Aided Verification, 415–435.
  3. Concrete problems in AI safety. arXiv preprint arXiv:1606.06565.
  4. PAC statistical model checking for Markov decision processes and stochastic games. In International Conference on Computer Aided Verification, 497–519. Springer.
  5. Principles of model checking. MIT press.
  6. Control synthesis from linear temporal logic specifications using model-free reinforcement learning. In 2020 IEEE International Conference on Robotics and Automation (ICRA), 10349–10355. IEEE.
  7. R-Max - a General Polynomial Time Algorithm for near-Optimal Reinforcement Learning. Journal of Machine Learning Research, 3.
  8. Verification of Markov decision processes using learning algorithms. In Automated Technology for Verification and Analysis: 12th International Symposium, ATVA 2014, Sydney, NSW, Australia, November 3-7, 2014, Proceedings 12, 98–114. Springer.
  9. Faster statistical model checking for unbounded temporal properties. ACM Transactions on Computational Logic (TOCL), 18(2): 1–25.
  10. Spot 2.0 — a framework for LTL and ω𝜔\omegaitalic_ω-automata manipulation. In Proceedings of the 14th International Symposium on Automated Technology for Verification and Analysis (ATVA’16), volume 9938 of Lecture Notes in Computer Science, 122–129. Springer.
  11. Probably approximately correct MDP learning and control with temporal logic constraints. arXiv preprint arXiv:1404.7073.
  12. Omega-regular objectives in model-free reinforcement learning. In Tools and Algorithms for the Construction and Analysis of Systems: 25th International Conference, TACAS 2019, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2019, Prague, Czech Republic, April 6–11, 2019, Proceedings, Part I, 395–412. Springer.
  13. Good-for-MDPs automata for probabilistic analysis and reinforcement learning. In Tools and Algorithms for the Construction and Analysis of Systems: 26th International Conference, TACAS 2020, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2020, Dublin, Ireland, April 25–30, 2020, Proceedings, Part I, 306–323. Springer.
  14. An Impossibility Result in Automata-Theoretic Reinforcement Learning. In Automated Technology for Verification and Analysis, volume 13505 of Lecture Notes in Computer Science, 42–57.
  15. Mungojerrie: Linear-Time Objectives in Model-Free Reinforcement Learning. In Tools and Algorithms for the Construction and Analysis of Systems, 527–545.
  16. Kakade, S. M. 2003. On the sample complexity of reinforcement learning. University of London, University College London (United Kingdom).
  17. Near-optimal reinforcement learning in polynomial time. Machine learning, 49(2): 209–232.
  18. Environment-independent task specifications via GLTL. arXiv preprint arXiv:1704.04341.
  19. Human-level control through deep reinforcement learning. nature, 518(7540): 529–533.
  20. Silver, D.; et al. 2016. Mastering the game of Go with deep neural networks and tree search. Nature, 529: 484–489.
  21. Reinforcement learning: An introduction. MIT press.
  22. Valiant, L. G. 1984. A theory of the learnable. Communications of the ACM, 27(11): 1134–1142.
  23. Policy optimization with linear temporal logic constraints. Advances in Neural Information Processing Systems, 35: 17690–17702.
  24. On the (in) tractability of reinforcement learning for LTL objectives. arXiv preprint arXiv:2111.12679.
Citations (6)

Summary

We haven't generated a summary for this paper yet.