Human-Inspired Framework to Accelerate Reinforcement Learning (2303.08115v3)
Abstract: Reinforcement learning (RL) is crucial for data science decision-making but suffers from sample inefficiency, particularly in real-world scenarios with costly physical interactions. This paper introduces a novel human-inspired framework to enhance RL algorithm sample efficiency. It achieves this by initially exposing the learning agent to simpler tasks that progressively increase in complexity, ultimately leading to the main task. This method requires no pre-training and involves learning simpler tasks for just one iteration. The resulting knowledge can facilitate various transfer learning approaches, such as value and policy transfer, without increasing computational complexity. It can be applied across different goals, environments, and RL algorithms, including value-based, policy-based, tabular, and deep RL methods. Experimental evaluations demonstrate the framework's effectiveness in enhancing sample efficiency, especially in challenging main tasks, demonstrated through both a simple Random Walk and more complex optimal control problems with constraints.
- Policy and value transfer in lifelong reinforcement learning, in: International Conference on Machine Learning, PMLR. pp. 20–29.
- Neurosymbolic reinforcement learning with formally verified exploration, in: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M.F., Lin, H. (Eds.), Advances in Neural Information Processing Systems, Curran Associates, Inc.. pp. 6172–6183.
- Feedback systems. An Introduction for Scientists and Engineers, Karl Johan Åström and Richard M. Murray .
- Near-optimal regret bounds for reinforcement learning, in: Koller, D., Schuurmans, D., Bengio, Y., Bottou, L. (Eds.), Advances in Neural Information Processing Systems, Curran Associates, Inc.
- Logarithmic online regret bounds for undiscounted reinforcement learning, in: Schölkopf, B., Platt, J., Hoffman, T. (Eds.), Advances in Neural Information Processing Systems, MIT Press.
- Successor features for transfer in reinforcement learning, in: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (Eds.), Advances in Neural Information Processing Systems, Curran Associates, Inc.
- Ta-explore: Teacher-assisted exploration for facilitating fast reinforcement learning, in: Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems, pp. 2412–2414.
- Accelerating actor-critic-based algorithms via pseudo-labels derived from prior knowledge. Information Sciences 661, 120182.
- Unifying count-based exploration and intrinsic motivation, in: Lee, D., Sugiyama, M., Luxburg, U., Guyon, I., Garnett, R. (Eds.), Advances in Neural Information Processing Systems, Curran Associates, Inc.
- Reinforcement learning for intelligent healthcare applications: A survey. Artificial Intelligence in Medicine 109, 101964. doi:https://doi.org/10.1016/j.artmed.2020.101964.
- Deriving subgoals autonomously to accelerate learning in sparse reward domains. Proceedings of the AAAI Conference on Artificial Intelligence 33, 881–889. doi:10.1609/aaai.v33i01.3301881.
- Probabilistic policy reuse in a reinforcement learning agent, in: Proceedings of the Fifth International Joint Conference on Autonomous Agents and Multiagent Systems, Association for Computing Machinery, New York, NY, USA. p. 720–727. doi:10.1145/1160633.1160762.
- Model-agnostic meta-learning for fast adaptation of deep networks, in: International conference on machine learning, PMLR. pp. 1126–1135.
- Addressing function approximation error in actor-critic methods, in: International conference on machine learning, PMLR. pp. 1587–1596.
- Information-seeking, curiosity, and attention: computational and neural mechanisms. Trends in Cognitive Sciences 17, 585–593. doi:https://doi.org/10.1016/j.tics.2013.09.001.
- Generalized predictive control for a coupled four tank mimo system using a continuous-discrete time observer. ISA Transactions 67, 280–292. doi:https://doi.org/10.1016/j.isatra.2016.11.021.
- Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor, in: International conference on machine learning, PMLR. pp. 1861–1870.
- Fast task inference with variational intrinsic successor features, in: International Conference on Learning Representations.
- Vime: Variational information maximizing exploration, in: Lee, D., Sugiyama, M., Luxburg, U., Guyon, I., Garnett, R. (Eds.), Advances in Neural Information Processing Systems, Curran Associates, Inc.
- Overcoming catastrophic forgetting in neural networks. Proceedings of the National Academy of Sciences 114, 3521–3526. doi:10.1073/pnas.1611835114, arXiv:https://www.pnas.org/content/114/13/3521.full.pdf.
- Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32, 1238–1274.
- Lipschitz lifelong reinforcement learning, in: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 8270–8278.
- Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 .
- Accelerating reinforcement learning for reaching using continuous curriculum learning, in: 2020 International Joint Conference on Neural Networks (IJCNN), IEEE. pp. 1–8.
- Reinforcement learning in financial markets. Data 4, 110.
- Asynchronous methods for deep reinforcement learning, in: International conference on machine learning, PMLR. pp. 1928–1937.
- Human-level control through deep reinforcement learning. nature 518, 529–533.
- Curriculum learning for reinforcement learning domains: A framework and survey. arXiv preprint arXiv:2003.04960 .
- Interesting object, curious agent: Learning task-agnostic exploration. Advances in Neural Information Processing Systems 34.
- Curiosity-driven exploration by self-supervised prediction, in: Precup, D., Teh, Y.W. (Eds.), Proceedings of the 34th International Conference on Machine Learning, PMLR. pp. 2778–2787.
- Stable-baselines3: Reliable reinforcement learning implementations. Journal of Machine Learning Research 22, 1–8. URL: http://jmlr.org/papers/v22/20-1364.html.
- A tour of reinforcement learning: The view from continuous control. Annual Review of Control, Robotics, and Autonomous Systems 2, 253–279.
- Continual learning in reinforcement environments. PhD thesis, University of Texas at Austin .
- Policy distillation, in: ICLR (Poster).
- A possibility for implementing curiosity and boredom in model-building neural controllers, in: Proc. of the international conference on simulation of adaptive behavior: From animals to animats, pp. 222–227.
- Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 .
- Receding horizon curiosity, in: Kaelbling, L.P., Kragic, D., Sugiura, K. (Eds.), Proceedings of the Conference on Robot Learning, PMLR. pp. 1278–1288.
- Safe, multi-agent, reinforcement learning for autonomous driving. arXiv preprint arXiv:1610.03295 .
- Mastering the game of go without human knowledge. nature 550, 354–359.
- Reinforcement learning in finite mdps: Pac analysis. Journal of Machine Learning Research 10.
- An analysis of model-based interval estimation for markov decision processes. Journal of Computer and System Sciences 74, 1309–1331. doi:https://doi.org/10.1016/j.jcss.2007.08.009. learning Theory 2005.
- Learning to learn: Meta-critic networks for sample efficient learning. arXiv:1706.09529.
- Reinforcement learning is direct adaptive optimal control. IEEE Control Systems Magazine 12, 19–22. doi:10.1109/37.126844.
- Reinforcement learning: An introduction. MIT press.
- Temporal difference learning and td-gammon. Communications of the ACM 38, 58–68.
- Sequential transfer in reinforcement learning with a generative model, in: International Conference on Machine Learning, PMLR. pp. 9481–9492.
- Safe reinforcement learning via curriculum induction, in: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M.F., Lin, H. (Eds.), Advances in Neural Information Processing Systems, Curran Associates, Inc.. pp. 12151–12162.
- Supervised reinforcement learning with recurrent neural network for dynamic treatment recommendation, in: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Association for Computing Machinery, New York, NY, USA. p. 2447–2456. doi:10.1145/3219819.3219961.
- A survey of transfer learning. Journal of Big data 3, 1–40.
- Reinforcement learning in healthcare: A survey. ACM Comput. Surv. 55. doi:10.1145/3477600.
- Towards sample efficient reinforcement learning., in: IJCAI, pp. 5739–5743.