Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 42 tok/s
Gemini 2.5 Pro 53 tok/s Pro
GPT-5 Medium 17 tok/s Pro
GPT-5 High 13 tok/s Pro
GPT-4o 101 tok/s Pro
Kimi K2 217 tok/s Pro
GPT OSS 120B 474 tok/s Pro
Claude Sonnet 4 36 tok/s Pro
2000 character limit reached

Guided Exploration in Reinforcement Learning via Monte Carlo Critic Optimization (2206.12674v2)

Published 25 Jun 2022 in cs.LG and cs.AI

Abstract: The class of deep deterministic off-policy algorithms is effectively applied to solve challenging continuous control problems. Current approaches commonly utilize random noise as an exploration method, which has several drawbacks, including the need for manual adjustment for a given task and the absence of exploratory calibration during the training process. We address these challenges by proposing a novel guided exploration method that uses an ensemble of Monte Carlo Critics for calculating exploratory action correction. The proposed method enhances the traditional exploration scheme by dynamically adjusting exploration. Subsequently, we present a novel algorithm that leverages the proposed exploratory module for both policy and critic modification. The presented algorithm demonstrates superior performance compared to modern reinforcement learning algorithms across a variety of problems in the DMControl suite.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (47)
  1. Joshua Achiam. 2018. Spinning Up in Deep Reinforcement Learning. (2018).
  2. A Survey of Exploration Methods in Reinforcement Learning. arXiv preprint arXiv:2109.00157 (2021).
  3. Never Give Up: Learning Directed Exploration Strategies. In International Conference on Learning Representations.
  4. Real-time learning and control using asynchronous dynamic programming. University of Massachusetts at Amherst, Department of Computer and ….
  5. Daniel E Berlyne. 1966. Curiosity and exploration. Science 153, 3731 (1966), 25–33.
  6. Openai gym. arXiv preprint arXiv:1606.01540 (2016).
  7. Exploration by random network distillation. arXiv preprint arXiv:1810.12894 (2018).
  8. Better exploration with optimistic actor-critic. In Proceedings of the 33rd International Conference on Neural Information Processing Systems. 1787–1798.
  9. Characteristics of the rewarder and intrinsic motivation of the rewardee. Journal of personality and social psychology 40, 1 (1981), 1.
  10. Automatic goal generation for reinforcement learning agents. In International conference on machine learning. PMLR, 1515–1528.
  11. Noisy Networks For Exploration. In International Conference on Learning Representations.
  12. Addressing function approximation error in actor-critic methods. In International Conference on Machine Learning. PMLR, 1587–1596.
  13. Javier Garcıa and Fernando Fernández. 2015. A comprehensive survey on safe reinforcement learning. Journal of Machine Learning Research 16, 1 (2015), 1437–1480.
  14. Samuel J Gershman. 2019. Uncertainty and exploration. Decision 6, 3 (2019), 277.
  15. Deep sparse rectifier neural networks. In Proceedings of the fourteenth international conference on artificial intelligence and statistics. JMLR Workshop and Conference Proceedings, 315–323.
  16. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. In Proceedings of the 35th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 80), Jennifer Dy and Andreas Krause (Eds.). PMLR, 1861–1870.
  17. Soft actor-critic algorithms and applications. arXiv preprint arXiv:1812.05905 (2018).
  18. Deep reinforcement learning that matters. In Proceedings of the AAAI conference on artificial intelligence, Vol. 32.
  19. VIME: Variational Information Maximizing Exploration. Advances in Neural Information Processing Systems 29 (2016), 1109–1117.
  20. When to trust your model: Model-based policy optimization. Advances in Neural Information Processing Systems 32 (2019).
  21. Celeste Kidd and Benjamin Y Hayden. 2015. The psychology and neuroscience of curiosity. Neuron 88, 3 (2015), 449–460.
  22. Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
  23. Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In International Conference on Machine Learning. PMLR, 5556–5566.
  24. Igor Kuznetsov and Andrey Filchenkov. 2021. Solving Continuous Control with Episodic Memory. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI-21. 2651–2657. Main Track.
  25. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015).
  26. Long-Ji Lin. 1992. Self-improving reactive agents based on reinforcement learning, planning and teaching. Machine learning 8, 3-4 (1992), 293–321.
  27. Daniel Ying-Jeh Little and Friedrich Tobias Sommer. 2013. Learning and exploration in action-perception loops. Frontiers in neural circuits 7 (2013), 37.
  28. Leveraging exploration in off-policy algorithms via normalizing flows. In Conference on Robot Learning. PMLR, 430–444.
  29. Discovering and achieving goals via world models. Advances in Neural Information Processing Systems 34 (2021).
  30. Information-based learning by agents in unbounded state spaces. Advances in Neural Information Processing Systems 27 (2014), 3023–3031.
  31. Andrew William Moore. 1990. Efficient memory-based learning for robot control. (1990).
  32. Visual reinforcement learning with imagined goals. In Proceedings of the 32nd International Conference on Neural Information Processing Systems. 9209–9220.
  33. Curiosity-driven exploration by self-supervised prediction. In International conference on machine learning. PMLR, 2778–2787.
  34. Self-supervised exploration via disagreement. In International conference on machine learning. PMLR, 5062–5071.
  35. Jürgen Schmidhuber. 1991. Curious model-building control systems. In Proc. international joint conference on neural networks. 1458–1463.
  36. Deterministic policy gradient algorithms. In International conference on machine learning. PMLR, 387–395.
  37. Intrinsic Motivation and Automatic Curricula via Asymmetric Self-Play. In International Conference on Learning Representations.
  38. Richard S Sutton. 1990. Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In Machine learning proceedings 1990. Elsevier, 216–224.
  39. Deepmind control suite. arXiv preprint arXiv:1801.00690 (2018).
  40. Sebastian B Thrun. 1992. Efficient exploration in reinforcement learning. (1992).
  41. An exploration of the relationship between use of safety-seeking behaviours and psychosis: A systematic review and meta-analysis. Clinical psychology & psychotherapy 24, 6 (2017), 1384–1405.
  42. Deep reinforcement learning with double q-learning. In Proceedings of the AAAI conference on artificial intelligence, Vol. 30.
  43. Generalized exploration in policy search. Machine Learning 106, 9 (2017), 1705–1724.
  44. Christopher M Vigorito. 2016. Intrinsically Motivated Exploration in Hierarchical Reinforcement Learning. (2016).
  45. Christopher JCH Watkins and Peter Dayan. 1992. Q-learning. Machine learning 8, 3-4 (1992), 279–292.
  46. Ronald J Williams. 1992. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning 8, 3 (1992), 229–256.
  47. Yijie Zhang and Herke Van Hoof. 2021. Deep Coherent Exploration for Continuous Control. In Proceedings of the 38th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 139), Marina Meila and Tong Zhang (Eds.). PMLR, 12567–12577.
Citations (2)

Summary

We haven't generated a summary for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Lightbulb On Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (1)

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube