Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Approximate Control for Continuous-Time POMDPs (2402.01431v2)

Published 2 Feb 2024 in cs.LG, cs.SY, eess.SY, and q-bio.QM

Abstract: This work proposes a decision-making framework for partially observable systems in continuous time with discrete state and action spaces. As optimal decision-making becomes intractable for large state spaces we employ approximation methods for the filtering and the control problem that scale well with an increasing number of states. Specifically, we approximate the high-dimensional filtering distribution by projecting it onto a parametric family of distributions, and integrate it into a control heuristic based on the fully observable system to obtain a scalable policy. We demonstrate the effectiveness of our approach on several partially observed systems, including queueing systems and chemical reaction networks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (44)
  1. POMDPs in continuous time and discrete spaces. In Advances in Neural Information Processing Systems, volume 33, pages 13151–13162, 2020.
  2. K. J. Åström. Optimal control of Markov processes with incomplete state information. Journal of Mathematical Analysis and Applications, 10(1):174–205, 1965.
  3. A. Bain and D. Crisan. Fundamentals of stochastic filtering, volume 3. Springer, 2009.
  4. A. Bensoussan. Stochastic control of partially observable systems. Cambridge University Press, 1992.
  5. D. Bertsekas. Dynamic programming and optimal control: Volume I, volume 1. Athena Scientific, 2012a.
  6. D. Bertsekas. Dynamic programming and optimal control: Volume II, volume 2. Athena Scientific, 2012b.
  7. C. M. Bishop. Pattern Recognition and Machine Learning. Springer, 2006.
  8. Queueing networks and Markov chains: modeling and performance evaluation with computer science applications. John Wiley & Sons, 2006.
  9. S. Bradtke and M. Duff. Reinforcement learning methods for continuous-time Markov decision problems. In Advances in Neural Information Processing Systems, volume 7, 1994.
  10. Approximate nonlinear filtering by projection on exponential manifolds of densities. Bernoulli, pages 495–534, 1999.
  11. L. Bronstein and H. Koeppl. Marginal process framework: A model reduction tool for Markov jump processes. Physical Review E, 97(6):062147, 2018a.
  12. L. Bronstein and H. Koeppl. A variational approach to moment-closure approximations for the kinetics of biomolecular reaction networks. The Journal of Chemical Physics, 148(1), 2018b.
  13. Sequential Monte Carlo methods in practice, volume 1. Springer, 2001.
  14. Model-based reinforcement learning for semi-Markov decision processes with neural ODEs. In Advances in Neural Information Processing Systems, volume 33, pages 19805–19816, 2020.
  15. General smoothing formulas for Markov-modulated Poisson observations. IEEE Transactions on Automatic Control, 50(8):1123–1134, 2005.
  16. Markov processes: characterization and convergence. John Wiley & Sons, 2009.
  17. C. Gardiner. Stochastic methods, volume 4. Springer, 2009.
  18. Reconstructing dynamic molecular states from single-cell time series. Journal of The Royal Society Interface, 13(122):20160533, 2016.
  19. Deep variational reinforcement learning for POMDPs. In International Conference on Machine Learning, pages 2117–2126, 2018.
  20. Planning and acting in partially observable stochastic domains. Artificial Intelligence, 101(1-2):99–134, 1998.
  21. QMDP-net: Deep learning for planning under partial observability. In Advances in Neural Information Processing Systems, volume 30, 2017.
  22. S. M. Kay. Fundamentals of statistical signal processing: estimation theory. Prentice-Hall, Inc., 1993.
  23. SARSOP: Efficient point-based POMDP planning by approximating optimally reachable belief spaces. In Proceedings of Robotics: Science and Systems IV, 2008.
  24. H. J. Kushner. Heavy traffic analysis of controlled queueing and communication networks, volume 28. Springer, 2001.
  25. Partially observable Markov decision processes in robotics: A survey. IEEE Transactions on Robotics, 2022.
  26. Learning policies for partially observable environments: Scaling up. In International Conference on Machine Learning, pages 362–370, 1995.
  27. P. S. Maybeck. Stochastic models, estimation, and control. Academic Press, 1982.
  28. T. Minka. Divergence measures and message passing. Technical Report MSR-TR-2005-173, Microsoft Research Ltd., 2005.
  29. J. R. Norris. Markov chains. Cambridge University Press, 1998.
  30. J. G. Proakis and M. Salehi. Digital communications. McGraw-Hill, fifth edition, 2008.
  31. Simulation of stochastic network dynamics via entropic matching. Phys. Rev. E, 87:022719, 2013.
  32. S. Särkkä. Bayesian filtering and smoothing. Cambridge University Press, 2013.
  33. S. Särkkä and A. Solin. Applied stochastic differential equations, volume 10. Cambridge University Press, 2019.
  34. D. Silver and J. Veness. Monte-Carlo planning in large POMDPs. In Advances in Neural Information Processing Systems, volume 23, 2010.
  35. Structured world belief for reinforcement learning in POMDP. In International Conference on Machine Learning, pages 9744–9755, 2021.
  36. DESPOT: Online POMDP planning with regularization. In Advances in Neural Information Processing Systems, volume 26, 2013.
  37. R. F. Stengel. Optimal control and estimation. Courier Corporation, 1994.
  38. Reinforcement learning: An introduction. MIT Press, 2018.
  39. Making Deep Q-learning methods robust to time discretization. In International Conference on Machine Learning, pages 6096–6104, 2019.
  40. S. Thrun. Monte Carlo POMDPs. In Advances in Neural Information Processing Systems, volume 12, 1999.
  41. D. J. Wilkinson. Stochastic modelling for systems biology. Chapman and Hall/CRC, 2018.
  42. W. M. Wonham. On the separation theorem of stochastic control. SIAM Journal on Control, 6(2):312–326, 1968.
  43. Decentralized cognitive MAC for dynamic spectrum access. In First IEEE International Symposium on New Frontiers in Dynamic Spectrum Access Networks, 2005, pages 224–232. IEEE, 2005.
  44. Solving continuous-state POMDPs via density projection. IEEE Transactions on Automatic Control, 55(5):1101–1116, 2010.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Yannick Eich (3 papers)
  2. Bastian Alt (14 papers)
  3. Heinz Koeppl (105 papers)

Summary

We haven't generated a summary for this paper yet.