Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Acting upon Imagination: when to trust imagined trajectories in model based reinforcement learning (2105.05716v6)

Published 12 May 2021 in cs.AI and cs.LG

Abstract: Model-based reinforcement learning (MBRL) aims to learn model(s) of the environment dynamics that can predict the outcome of its actions. Forward application of the model yields so called imagined trajectories (sequences of action, predicted state-reward) used to optimize the set of candidate actions that maximize expected reward. The outcome, an ideal imagined trajectory or plan, is imperfect and typically MBRL relies on model predictive control (MPC) to overcome this by continuously re-planning from scratch, incurring thus major computational cost and increasing complexity in tasks with longer receding horizon. We propose uncertainty estimation methods for online evaluation of imagined trajectories to assess whether further planned actions can be trusted to deliver acceptable reward. These methods include comparing the error after performing the last action with the standard expected error and using model uncertainty to assess the deviation from expected outcomes. Additionally, we introduce methods that exploit the forward propagation of the dynamics model to evaluate if the remainder of the plan aligns with expected results and assess the remainder of the plan in terms of the expected reward. Our experiments demonstrate the effectiveness of the proposed uncertainty estimation methods by applying them to avoid unnecessary trajectory replanning in a shooting MBRL setting. Results highlight significant reduction on computational costs without sacrificing performance.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (26)
  1. Chapter 3 - the cross-entropy method for optimization. In C.R. Rao and Venu Govindaraju, editors, Handbook of Statistics, volume 31 of Handbook of Statistics, pages 35 – 59. Elsevier, 2013. doi: https://doi.org/10.1016/B978-0-444-53859-8.00003-5. URL http://www.sciencedirect.com/science/article/pii/B9780444538598000035.
  2. Sample-efficient reinforcement learning with stochastic ensemble value expansion. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS’18, page 8234–8244, Red Hook, NY, USA, 2018. Curran Associates Inc.
  3. Model Predictive Control. Advanced Textbooks in Control and Signal Processing. Springer London, 2004. ISBN 9781852336943. URL https://books.google.at/books?id=Sc1H3f3E8CQC.
  4. Deep reinforcement learning in a handful of trials using probabilistic dynamics models, 2018.
  5. Aggressive deep driving: Model predictive control with a cnn cost model. 2017.
  6. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor, 2018.
  7. Improving robot dual-system motor learning with intrinsically motivated meta-control and latent-space experience imagination. CoRR, abs/2004.08830, 2020. URL https://arxiv.org/abs/2004.08830.
  8. Dream to control: Learning behaviors by latent imagination, 2020.
  9. Temporal difference learning for model predictive control, 2022.
  10. Learning continuous control policies by stochastic value gradients, 2015.
  11. When to trust your model: Model-based policy optimization. In NeurIPS, 2019.
  12. Uncertainty-driven imagination for continuous deep reinforcement learning. volume 78 of Proceedings of Machine Learning Research, pages 195–206. PMLR, 13–15 Nov 2017. URL http://proceedings.mlr.press/v78/kalweit17a.html.
  13. Simple and scalable predictive uncertainty estimation using deep ensembles, 2016.
  14. Continuous control with deep reinforcement learning. CoRR, abs/1509.02971, 2015. URL http://arxiv.org/abs/1509.02971.
  15. Human-level control through deep reinforcement learning. Nature, 518(7540):529–533, February 2015. ISSN 00280836. URL http://dx.doi.org/10.1038/nature14236.
  16. Neural network dynamics for model-based deep reinforcement learning with model-free fine-tuning. 2018 IEEE International Conference on Robotics and Automation (ICRA), pages 7559–7566, 2018.
  17. Value prediction network. In NIPS, 2017.
  18. Friedrich Pukelsheim. The three sigma rule. The American Statistician, 48:88–91, 1994.
  19. Anvil V. Rao. A survey of numerical methods for optimal control. Advances in the Astronautical Science, 135:497–528, 2010.
  20. Proximal policy optimization algorithms, 2017.
  21. Richard S. Sutton. Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In ML Workshop, 1990.
  22. Introduction to Reinforcement Learning. MIT Press, Cambridge, MA, USA, 1st edition, 1998. ISBN 0262193981.
  23. Mujoco: A physics engine for model-based control. In IROS, pages 5026–5033. IEEE, 2012. ISBN 978-1-4673-1737-5.
  24. Information theoretic model predictive control: Theory and applications to autonomous driving, 2017.
  25. Mopo: Model-based offline policy optimization, 2020.
  26. Bridging imagination and reality for model-based deep reinforcement learning. NeurIPS, 2020.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Adrian Remonda (3 papers)
  2. Eduardo Veas (9 papers)
  3. Granit Luzhnica (2 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets