Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 134 tok/s
Gemini 2.5 Pro 49 tok/s Pro
GPT-5 Medium 28 tok/s Pro
GPT-5 High 24 tok/s Pro
GPT-4o 65 tok/s Pro
Kimi K2 186 tok/s Pro
GPT OSS 120B 439 tok/s Pro
Claude Sonnet 4.5 33 tok/s Pro
2000 character limit reached

Reinforcement Learning with Elastic Time Steps (2402.14961v4)

Published 22 Feb 2024 in cs.RO and cs.LG

Abstract: Traditional Reinforcement Learning (RL) policies are typically implemented with fixed control rates, often disregarding the impact of control rate selection. This can lead to inefficiencies as the optimal control rate varies with task requirements. We propose the Multi-Objective Soft Elastic Actor-Critic (MOSEAC), an off-policy actor-critic algorithm that uses elastic time steps to dynamically adjust the control frequency. This approach minimizes computational resources by selecting the lowest viable frequency. We show that MOSEAC converges and produces stable policies at the theoretical level, and validate our findings in a real-time 3D racing game. MOSEAC significantly outperformed other variable time step approaches in terms of energy efficiency and task effectiveness. Additionally, MOSEAC demonstrated faster and more stable training, showcasing its potential for real-world RL applications in robotics.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (33)
  1. D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot et al., “Mastering the game of go with deep neural networks and tree search,” nature, vol. 529, no. 7587, pp. 484–489, 2016.
  2. tmrl, “tmrl main page,” https://github.com/trackmania-rl/tmrl, 2023.
  3. T. Haarnoja, A. Zhou, K. Hartikainen, G. Tucker, S. Ha, J. Tan, V. Kumar, H. Zhu, A. Gupta, P. Abbeel et al., “Soft actor-critic algorithms and applications,” arXiv preprint arXiv:1812.05905, 2018.
  4. P. R. Wurman, S. Barrett, K. Kawamoto, J. MacGlashan, K. Subramanian, T. J. Walsh, R. Capobianco, A. Devlic, F. Eckert, F. Fuchs et al., “Outracing champion gran turismo drivers with deep reinforcement learning,” Nature, vol. 602, no. 7896, pp. 223–228, 2022.
  5. J. Li, J. Ding, T. Chai, F. L. Lewis, and S. Jagannathan, “Adaptive interleaved reinforcement learning: Robust stability of affine nonlinear systems with unknown uncertainty,” IEEE Transactions on Neural Networks and Learning Systems, vol. 33, no. 1, pp. 270–280, 2020.
  6. Anonymous, “Reinforcement learning with elastic time steps,” 2024. [Online]. Available: https://openreview.net/forum?id=riQmzq5FaQ
  7. D. Wang and G. Beltrame, “Deployable reinforcement learning with variable control rate,” arXiv preprint arXiv:2401.09286, 2024.
  8. S. Amin, M. Gomrokchi, H. Aboutalebi, H. Satija, and D. Precup, “Locally persistent exploration in continuous control tasks with sparse rewards,” arXiv preprint arXiv:2012.13658, 2020.
  9. S. Park, J. Kim, and G. Kim, “Time discretization-invariant safe action repetition for policy gradient methods,” Advances in Neural Information Processing Systems, vol. 34, pp. 267–279, 2021.
  10. Y. Bouteiller, S. Ramstedt, G. Beltrame, C. Pal, and J. Binas, “Reinforcement learning with random delays,” in International conference on learning representations, 2021.
  11. T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor,” in International conference on machine learning.   PMLR, 2018, pp. 1861–1870.
  12. A. Karimi, J. Jin, J. Luo, A. R. Mahmood, M. Jagersand, and S. Tosatto, “Dynamic decision frequency with continuous options,” in 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).   IEEE, 2023, pp. 7545–7552.
  13. S. Sharma, A. Srinivas, and B. Ravindran, “Learning to repeat: Fine grained action repetition for deep reinforcement learning,” arXiv preprint arXiv:1702.06054, 2017.
  14. A. M. Metelli, F. Mazzolini, L. Bisi, L. Sabbioni, and M. Restelli, “Control frequency adaptation via action persistence in batch reinforcement learning,” in International Conference on Machine Learning.   PMLR, 2020, pp. 6862–6873.
  15. J. Lee, B.-J. Lee, and K.-E. Kim, “Reinforcement learning for control with multiple frequencies,” Advances in Neural Information Processing Systems, vol. 33, pp. 3254–3264, 2020.
  16. T. G. Dietterich, “Hierarchical reinforcement learning with the maxq value function decomposition,” Journal of artificial intelligence research, vol. 13, pp. 227–303, 2000.
  17. U. T. 2023, “Trackmania main page,” https://www.ubisoft.com/en-us/game/trackmania/trackmania, 2023.
  18. Y. Chen, H. Wu, Y. Liang, and G. Lai, “Varlenmarl: A framework of variable-length time-step multi-agent reinforcement learning for cooperative charging in sensor networks,” in 2021 18th Annual IEEE International Conference on Sensing, Communication, and Networking (SECON).   IEEE, 2021, pp. 1–9.
  19. S. Nasiriany, H. Liu, and Y. Zhu, “Augmenting reinforcement learning with behavior primitives for diverse manipulation tasks,” in 2022 International Conference on Robotics and Automation (ICRA).   IEEE, 2022, pp. 7477–7484.
  20. F. Pardo, A. Tavakoli, V. Levdik, and P. Kormushev, “Time limits in reinforcement learning,” in International Conference on Machine Learning.   PMLR, 2018, pp. 4045–4054.
  21. Z. Zhang, D. Zhang, and R. C. Qiu, “Deep reinforcement learning for power system applications: An overview,” CSEE Journal of Power and Energy Systems, vol. 6, no. 1, pp. 213–225, 2019.
  22. Z. Yang, K. Merrick, L. Jin, and H. A. Abbass, “Hierarchical deep reinforcement learning for continuous action control,” IEEE transactions on neural networks and learning systems, vol. 29, no. 11, pp. 5174–5184, 2018.
  23. Raspberrypi, “Raspberrypi main page,” https://www.raspberrypi.com/, 2023.
  24. E. Bregu, N. Casamassima, D. Cantoni, L. Mottola, and K. Whitehouse, “Reactive control of autonomous drones,” in Proceedings of the 14th Annual International Conference on Mobile Systems, Applications, and Services, 2016, pp. 207–219.
  25. gymnasium, “Gymnasium main page,” https://gymnasium.farama.org/content/basic_usage/, 2023.
  26. Openplanet, “Openplanet main page,” https://openplanet.dev/, 2023.
  27. I. J. Balaban, “An optimal algorithm for finding segments intersections,” in Proceedings of the eleventh annual symposium on Computational geometry, 1995, pp. 211–219.
  28. Rtgym, “Rtgym main page,” https://github.com/yannbouteiller/rtgym/releases, 2023.
  29. K. O’Shea and R. Nash, “An introduction to convolutional neural networks,” arXiv preprint arXiv:1511.08458, 2015.
  30. T. K. Kim, “T test as a parametric statistic,” Korean journal of anesthesiology, vol. 68, no. 6, pp. 540–546, 2015.
  31. L. M. Lix, J. C. Keselman, and H. J. Keselman, “Consequences of assumption violations revisited: A quantitative review of alternatives to the one-way analysis of variance f test,” Review of educational research, vol. 66, no. 4, pp. 579–619, 1996.
  32. E. Todorov, T. Erez, and Y. Tassa, “Mujoco: A physics engine for model-based control,” in 2012 IEEE/RSJ international conference on intelligent robots and systems.   IEEE, 2012, pp. 5026–5033.
  33. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
Citations (2)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 3 tweets and received 1 like.

Upgrade to Pro to view all of the tweets about this paper:

Youtube Logo Streamline Icon: https://streamlinehq.com