Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 48 tok/s
Gemini 2.5 Pro 48 tok/s Pro
GPT-5 Medium 26 tok/s Pro
GPT-5 High 19 tok/s Pro
GPT-4o 107 tok/s Pro
Kimi K2 205 tok/s Pro
GPT OSS 120B 473 tok/s Pro
Claude Sonnet 4 37 tok/s Pro
2000 character limit reached

A PAC Learning Algorithm for LTL and Omega-regular Objectives in MDPs (2310.12248v3)

Published 18 Oct 2023 in cs.LG and cs.LO

Abstract: Linear temporal logic (LTL) and omega-regular objectives -- a superset of LTL -- have seen recent use as a way to express non-Markovian objectives in reinforcement learning. We introduce a model-based probably approximately correct (PAC) learning algorithm for omega-regular objectives in Markov decision processes (MDPs). As part of the development of our algorithm, we introduce the epsilon-recurrence time: a measure of the speed at which a policy converges to the satisfaction of the omega-regular objective in the limit. We prove that our algorithm only requires a polynomial number of samples in the relevant parameters, and perform experiments which confirm our theory.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (24)
  1. A framework for transforming specifications in reinforcement learning. In Principles of Systems Design: Essays Dedicated to Thomas A. Henzinger on the Occasion of His 60th Birthday, 604–624. Springer.
  2. Policy Synthesis and Reinforcement Learning for Discounted LTL. In Computer Aided Verification, 415–435.
  3. Concrete problems in AI safety. arXiv preprint arXiv:1606.06565.
  4. PAC statistical model checking for Markov decision processes and stochastic games. In International Conference on Computer Aided Verification, 497–519. Springer.
  5. Principles of model checking. MIT press.
  6. Control synthesis from linear temporal logic specifications using model-free reinforcement learning. In 2020 IEEE International Conference on Robotics and Automation (ICRA), 10349–10355. IEEE.
  7. R-Max - a General Polynomial Time Algorithm for near-Optimal Reinforcement Learning. Journal of Machine Learning Research, 3.
  8. Verification of Markov decision processes using learning algorithms. In Automated Technology for Verification and Analysis: 12th International Symposium, ATVA 2014, Sydney, NSW, Australia, November 3-7, 2014, Proceedings 12, 98–114. Springer.
  9. Faster statistical model checking for unbounded temporal properties. ACM Transactions on Computational Logic (TOCL), 18(2): 1–25.
  10. Spot 2.0 — a framework for LTL and ω𝜔\omegaitalic_ω-automata manipulation. In Proceedings of the 14th International Symposium on Automated Technology for Verification and Analysis (ATVA’16), volume 9938 of Lecture Notes in Computer Science, 122–129. Springer.
  11. Probably approximately correct MDP learning and control with temporal logic constraints. arXiv preprint arXiv:1404.7073.
  12. Omega-regular objectives in model-free reinforcement learning. In Tools and Algorithms for the Construction and Analysis of Systems: 25th International Conference, TACAS 2019, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2019, Prague, Czech Republic, April 6–11, 2019, Proceedings, Part I, 395–412. Springer.
  13. Good-for-MDPs automata for probabilistic analysis and reinforcement learning. In Tools and Algorithms for the Construction and Analysis of Systems: 26th International Conference, TACAS 2020, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2020, Dublin, Ireland, April 25–30, 2020, Proceedings, Part I, 306–323. Springer.
  14. An Impossibility Result in Automata-Theoretic Reinforcement Learning. In Automated Technology for Verification and Analysis, volume 13505 of Lecture Notes in Computer Science, 42–57.
  15. Mungojerrie: Linear-Time Objectives in Model-Free Reinforcement Learning. In Tools and Algorithms for the Construction and Analysis of Systems, 527–545.
  16. Kakade, S. M. 2003. On the sample complexity of reinforcement learning. University of London, University College London (United Kingdom).
  17. Near-optimal reinforcement learning in polynomial time. Machine learning, 49(2): 209–232.
  18. Environment-independent task specifications via GLTL. arXiv preprint arXiv:1704.04341.
  19. Human-level control through deep reinforcement learning. nature, 518(7540): 529–533.
  20. Silver, D.; et al. 2016. Mastering the game of Go with deep neural networks and tree search. Nature, 529: 484–489.
  21. Reinforcement learning: An introduction. MIT press.
  22. Valiant, L. G. 1984. A theory of the learnable. Communications of the ACM, 27(11): 1134–1142.
  23. Policy optimization with linear temporal logic constraints. Advances in Neural Information Processing Systems, 35: 17690–17702.
  24. On the (in) tractability of reinforcement learning for LTL objectives. arXiv preprint arXiv:2111.12679.
Citations (6)

Summary

We haven't generated a summary for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Lightbulb On Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.