To the Max: Reinventing Reward in Reinforcement Learning (2402.01361v2)
Abstract: In reinforcement learning (RL), different reward functions can define the same optimal policy but result in drastically different learning performance. For some, the agent gets stuck with a suboptimal behavior, and for others, it solves the task efficiently. Choosing a good reward function is hence an extremely important yet challenging problem. In this paper, we explore an alternative approach for using rewards for learning. We introduce \textit{max-reward RL}, where an agent optimizes the maximum rather than the cumulative reward. Unlike earlier works, our approach works for deterministic and stochastic environments and can be easily combined with state-of-the-art RL algorithms. In the experiments, we study the performance of max-reward RL algorithms in two goal-reaching environments from Gymnasium-Robotics and demonstrate its benefits over standard RL. The code is available at https://github.com/veviurko/To-the-Max.
- Bellman, R. Some applications of the theory of dynamic programming—a review. Journal of the Operations Research Society of America, 2(3):275–288, 1954.
- Dota 2 with large scale deep reinforcement learning. arXiv preprint arXiv:1912.06680, 2019.
- Exploration by random network distillation, 2018.
- Reinforcement learning with non-cumulative objective. IEEE Transactions on Machine Learning in Communications and Networking, 2023.
- Gymnasium robotics, 2023. URL http://github.com/Farama-Foundation/Gymnasium-Robotics.
- Eschmann, J. Reward function design in reinforcement learning. Reinforcement Learning Algorithms: Analysis and Applications, pp. 25–33, 2021.
- Reach-avoid problems with time-varying dynamics, targets and constraints, 2014.
- Bridging Hamilton-Jacobi safety analysis and reinforcement learning. In 2019 International Conference on Robotics and Automation (ICRA), pp. 8550–8556, May 2019.
- Automatic goal generation for reinforcement learning agents, 2018.
- Learning to reach goals via iterated supervised learning, 2020.
- Maximum reward formulation in reinforcement learning, 2020.
- Learning agile soccer skills for a bipedal robot with deep reinforcement learning, 2023.
- Safety and liveness guarantees through Reach-Avoid reinforcement learning. December 2021.
- Reward learning from human preferences and demonstrations in atari, 2018.
- Jensen, J. L. W. V. Sur les fonctions convexes et les inégalités entre les valeurs moyennes. Acta mathematica, 30(1):175–193, 1906.
- Specification gaming: the flip side of ai ingenuity. 2020.
- Infinite horizon stochastic optimal control problems with running maximum cost. SIAM J. Control Optim., 56(5):3296–3319, January 2018.
- Mataric, M. J. Reward functions for accelerated learning. In Machine learning proceedings 1994, pp. 181–189. Elsevier, 1994.
- Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602, 2013.
- Policy invariance under reward transformations: Theory and application to reward shaping. In Icml, volume 99, pp. 278–287. Citeseer, 1999.
- Curiosity-driven exploration by self-supervised prediction, 2017.
- Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464, 2018.
- Maximum reward reinforcement learning: A non-cumulative reward criterion. Expert Systems with Applications, 31(2):351–359, 2006. ISSN 0957-4174. doi: https://doi.org/10.1016/j.eswa.2005.09.054. URL https://www.sciencedirect.com/science/article/pii/S0957417405002228.
- Deterministic policy gradient algorithms. In International conference on machine learning, pp. 387–395. Pmlr, 2014.
- Policy gradient methods for reinforcement learning with function approximation. Advances in neural information processing systems, 12, 1999.
- Exploration: A study of count-based exploration for deep reinforcement learning, 2017.
- Gymnasium, March 2023. URL https://zenodo.org/record/8127025.
- Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards, 2019.
- Grandmaster level in starcraft ii using multi-agent reinforcement learning. Nature, 575(7782):350–354, 2019.
- Planning with general objective functions: Going beyond total rewards. Advances in Neural Information Processing Systems, 33:14486–14497, 2020.
- Reachability constrained reinforcement learning. May 2022.