Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
129 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

To the Max: Reinventing Reward in Reinforcement Learning (2402.01361v2)

Published 2 Feb 2024 in cs.LG

Abstract: In reinforcement learning (RL), different reward functions can define the same optimal policy but result in drastically different learning performance. For some, the agent gets stuck with a suboptimal behavior, and for others, it solves the task efficiently. Choosing a good reward function is hence an extremely important yet challenging problem. In this paper, we explore an alternative approach for using rewards for learning. We introduce \textit{max-reward RL}, where an agent optimizes the maximum rather than the cumulative reward. Unlike earlier works, our approach works for deterministic and stochastic environments and can be easily combined with state-of-the-art RL algorithms. In the experiments, we study the performance of max-reward RL algorithms in two goal-reaching environments from Gymnasium-Robotics and demonstrate its benefits over standard RL. The code is available at https://github.com/veviurko/To-the-Max.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (31)
  1. Bellman, R. Some applications of the theory of dynamic programming—a review. Journal of the Operations Research Society of America, 2(3):275–288, 1954.
  2. Dota 2 with large scale deep reinforcement learning. arXiv preprint arXiv:1912.06680, 2019.
  3. Exploration by random network distillation, 2018.
  4. Reinforcement learning with non-cumulative objective. IEEE Transactions on Machine Learning in Communications and Networking, 2023.
  5. Gymnasium robotics, 2023. URL http://github.com/Farama-Foundation/Gymnasium-Robotics.
  6. Eschmann, J. Reward function design in reinforcement learning. Reinforcement Learning Algorithms: Analysis and Applications, pp.  25–33, 2021.
  7. Reach-avoid problems with time-varying dynamics, targets and constraints, 2014.
  8. Bridging Hamilton-Jacobi safety analysis and reinforcement learning. In 2019 International Conference on Robotics and Automation (ICRA), pp.  8550–8556, May 2019.
  9. Automatic goal generation for reinforcement learning agents, 2018.
  10. Learning to reach goals via iterated supervised learning, 2020.
  11. Maximum reward formulation in reinforcement learning, 2020.
  12. Learning agile soccer skills for a bipedal robot with deep reinforcement learning, 2023.
  13. Safety and liveness guarantees through Reach-Avoid reinforcement learning. December 2021.
  14. Reward learning from human preferences and demonstrations in atari, 2018.
  15. Jensen, J. L. W. V. Sur les fonctions convexes et les inégalités entre les valeurs moyennes. Acta mathematica, 30(1):175–193, 1906.
  16. Specification gaming: the flip side of ai ingenuity. 2020.
  17. Infinite horizon stochastic optimal control problems with running maximum cost. SIAM J. Control Optim., 56(5):3296–3319, January 2018.
  18. Mataric, M. J. Reward functions for accelerated learning. In Machine learning proceedings 1994, pp.  181–189. Elsevier, 1994.
  19. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602, 2013.
  20. Policy invariance under reward transformations: Theory and application to reward shaping. In Icml, volume 99, pp.  278–287. Citeseer, 1999.
  21. Curiosity-driven exploration by self-supervised prediction, 2017.
  22. Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464, 2018.
  23. Maximum reward reinforcement learning: A non-cumulative reward criterion. Expert Systems with Applications, 31(2):351–359, 2006. ISSN 0957-4174. doi: https://doi.org/10.1016/j.eswa.2005.09.054. URL https://www.sciencedirect.com/science/article/pii/S0957417405002228.
  24. Deterministic policy gradient algorithms. In International conference on machine learning, pp.  387–395. Pmlr, 2014.
  25. Policy gradient methods for reinforcement learning with function approximation. Advances in neural information processing systems, 12, 1999.
  26. Exploration: A study of count-based exploration for deep reinforcement learning, 2017.
  27. Gymnasium, March 2023. URL https://zenodo.org/record/8127025.
  28. Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards, 2019.
  29. Grandmaster level in starcraft ii using multi-agent reinforcement learning. Nature, 575(7782):350–354, 2019.
  30. Planning with general objective functions: Going beyond total rewards. Advances in Neural Information Processing Systems, 33:14486–14497, 2020.
  31. Reachability constrained reinforcement learning. May 2022.
Citations (4)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets