Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 134 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 28 tok/s Pro
GPT-5 High 33 tok/s Pro
GPT-4o 70 tok/s Pro
Kimi K2 205 tok/s Pro
GPT OSS 120B 428 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

Reward Shaping via Diffusion Process in Reinforcement Learning (2306.11885v1)

Published 20 Jun 2023 in cs.LG and cs.AI

Abstract: Reinforcement Learning (RL) models have continually evolved to navigate the exploration - exploitation trade-off in uncertain Markov Decision Processes (MDPs). In this study, I leverage the principles of stochastic thermodynamics and system dynamics to explore reward shaping via diffusion processes. This provides an elegant framework as a way to think about exploration-exploitation trade-off. This article sheds light on relationships between information entropy, stochastic system dynamics, and their influences on entropy production. This exploration allows us to construct a dual-pronged framework that can be interpreted as either a maximum entropy program for deriving efficient policies or a modified cost optimization program accounting for informational costs and benefits. This work presents a novel perspective on the physical nature of information and its implications for online learning in MDPs, consequently providing a better understanding of information-oriented formulations in RL.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (27)
  1. Refined second law of thermodynamics for fast random processes. Journal of statistical physics, 147(3):487–505, 2012.
  2. C. H. Bennett. Demons, engines and the second law. Scientific American, 257(5):108–116, 1987.
  3. D. Bertsekas. Dynamic Programming and Optimal Control. Athena Scientific, Nashua, NH, USA, 3rd edition, 2005.
  4. L. Boltzmann. The second law of thermodynamics. In Theoretical physics and philosophical problems, pages 13–32. Springer, 1974.
  5. R. Brockett and J. Willems. Stochastic control and the second law of thermodynamics. In Decision and Control including the 17th Symposium on Adaptive Processes, 1978 IEEE Conference on, volume 17, pages 1007–1011. IEEE, 1979.
  6. The Art and Theory of Dynamic Programming. Academic Press, New York, NY, USA, 1st edition, 1977.
  7. A. Ghate. Optimal minimum bids and inventory scrapping in sequential, single-unit, vickrey auctions with demand learning. European Journal of Operational Research, 245(2):555–570, 2015.
  8. S. Ito and T. Sagawa. Information flow and entropy production on bayesian networks. Mathematical Foundations and Applications of Graph Entropy, 3:2, 2016.
  9. E. T. Jaynes. Information theory and statistical mechanics. Physical review, 106(4):620, 1957.
  10. J. Kotas and A. Ghate. Response-guided dosing for rheumatoid arthritis. IIE Transactions on Healthcare Systems Engineering, 6(1):1–21, 2016.
  11. V. Krishnamurthy. Partially observed Markov decision processes. Cambridge University Press, Cambridge, United Kingdom, 1st edition, 2016.
  12. P. R. Kumar. A survey of some results in stochastic adaptive control. SIAM Journal on Control and Optimization, 23(3):329–380, 1985.
  13. Stochastic Systems: Estimation, Identification, and Adaptive Control. SIAM, Philadelphia, PA, USA, 2016.
  14. R. Landauer. Irreversibility and heat generation in the computing process. IBM journal of research and development, 5(3):183–191, 1961.
  15. R. Landauer. Information is physical. Physics Today, 44(5):23–29, 1991.
  16. J. C. Maxwell. Theory of heat. Longmans, 1921.
  17. M. B. Propp. The thermodynamic properties of Markov processes. PhD thesis, Massachusetts Institute of Technology, 1985.
  18. M. L. Puterman. Markov decision processes : Discrete stochastic dynamic programming. John Wiley and Sons, New York, NY, USA, 1994.
  19. G. N. Saridis. Entropy formulation of optimal and adaptive control. IEEE Transactions on Automatic Control, 33(8):713–721, 1988.
  20. U. Seifert. Entropy production along a stochastic trajectory and an integral fluctuation theorem. Physical review letters, 95(4):040602, 2005.
  21. Stochastic thermodynamics: An introduction. In AIP Conference Proceedings, volume 1332, pages 56–76, 2011.
  22. L. Szilárd. On entropy reduction in a thermodynamic system by interference by intelligent subjects. Zhurnal Physik, 53, 1976.
  23. Finite state markov decision processes with transfer entropy costs. arXiv preprint arXiv:1708.09096, 2017.
  24. E. A. Theodorou. Nonlinear stochastic control and information theoretic dualities: Connections, interdependencies and thermodynamic interpretations. Entropy, 17(5):3352–3375, 2015.
  25. E. A. Theodorou and E. Todorov. Relative entropy and free energy dualities: Connections to path integral and kl control. In Decision and Control (CDC), 2012 IEEE 51st Annual Conference on, pages 1466–1473. IEEE, 2012.
  26. E. Todorov. Efficient computation of optimal actions. Proceedings of the national academy of sciences, 106(28):11478–11483, 2009.
  27. N. Wolchover. The quantum thermodynamics revolution, 2017. Accessed: 2017-05-05.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (1)

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube