Efficient Multi-Task Reinforcement Learning via Task-Specific Action Correction (2404.05950v1)
Abstract: Multi-task reinforcement learning (MTRL) demonstrate potential for enhancing the generalization of a robot, enabling it to perform multiple tasks concurrently. However, the performance of MTRL may still be susceptible to conflicts between tasks and negative interference. To facilitate efficient MTRL, we propose Task-Specific Action Correction (TSAC), a general and complementary approach designed for simultaneous learning of multiple tasks. TSAC decomposes policy learning into two separate policies: a shared policy (SP) and an action correction policy (ACP). To alleviate conflicts resulting from excessive focus on specific tasks' details in SP, ACP incorporates goal-oriented sparse rewards, enabling an agent to adopt a long-term perspective and achieve generalization across tasks. Additional rewards transform the original problem into a multi-objective MTRL problem. Furthermore, to convert the multi-objective MTRL into a single-objective formulation, TSAC assigns a virtual expected budget to the sparse rewards and employs Lagrangian method to transform a constrained single-objective optimization into an unconstrained one. Experimental evaluations conducted on Meta-World's MT10 and MT50 benchmarks demonstrate that TSAC outperforms existing state-of-the-art methods, achieving significant improvements in both sample efficiency and effective action execution.
- D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot et al., “Mastering the game of go with deep neural networks and tree search,” Nature, vol. 529, no. 7587, pp. 484–489, 2016.
- D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hubert, L. Baker, M. Lai, A. Bolton et al., “Mastering the game of go without human knowledge,” Nature, vol. 550, no. 7676, pp. 354–359, 2017.
- C. Berner, G. Brockman, B. Chan, V. Cheung, P. Dębiak, C. Dennison, D. Farhi, Q. Fischer, S. Hashme, C. Hesse et al., “Dota 2 with large scale deep reinforcement learning,” arXiv preprint arXiv:1912.06680, 2019.
- O. Vinyals, I. Babuschkin, W. M. Czarnecki, M. Mathieu, A. Dudzik, J. Chung, D. H. Choi, R. Powell, T. Ewalds, P. Georgiev et al., “Grandmaster level in starcraft ii using multi-agent reinforcement learning,” Nature, vol. 575, no. 7782, pp. 350–354, 2019.
- L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. Wainwright, P. Mishkin, C. Zhang, S. Agarwal, K. Slama, A. Ray et al., “Training language models to follow instructions with human feedback,” Advances in Neural Information Processing Systems, vol. 35, pp. 27 730–27 744, 2022.
- T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra, “Continuous control with deep reinforcement learning,” arXiv preprint arXiv:1509.02971, 2015.
- L. Pinto and A. Gupta, “Learning to push by grasping: Using multiple tasks for effective learning,” in 2017 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2017, pp. 2161–2168.
- S. Ruder, “An overview of multi-task learning in deep neural networks,” arXiv preprint arXiv:1706.05098, 2017.
- T. Yu, S. Kumar, A. Gupta, S. Levine, K. Hausman, and C. Finn, “Gradient surgery for multi-task learning,” Advances in Neural Information Processing Systems, vol. 33, pp. 5824–5836, 2020.
- A. A. Rusu, S. G. Colmenarejo, C. Gulcehre, G. Desjardins, J. Kirkpatrick, R. Pascanu, V. Mnih, K. Kavukcuoglu, and R. Hadsell, “Policy distillation,” arXiv preprint arXiv:1511.06295, 2015.
- Y. Teh, V. Bapst, W. M. Czarnecki, J. Quan, J. Kirkpatrick, R. Hadsell, N. Heess, and R. Pascanu, “Distral: Robust multitask reinforcement learning,” Advances in Neural Information Processing Systems, vol. 30, 2017.
- S. Liu, S. James, A. J. Davison, and E. Johns, “Auto-lambda: Disentangling dynamic task relationships,” arXiv preprint arXiv:2202.03091, 2022.
- C. Fifty, E. Amid, Z. Zhao, T. Yu, R. Anil, and C. Finn, “Efficiently identifying task groupings for multi-task learning,” Advances in Neural Information Processing Systems, vol. 34, pp. 27 503–27 516, 2021.
- T. Yu, A. Kumar, Y. Chebotar, K. Hausman, S. Levine, and C. Finn, “Conservative data sharing for multi-task offline reinforcement learning,” Advances in Neural Information Processing Systems, vol. 34, pp. 11 501–11 516, 2021.
- T. Yu, A. Kumar, Y. Chebotar, K. Hausman, C. Finn, and S. Levine, “How to leverage unlabeled data in offline reinforcement learning,” in International Conference on Machine Learning. PMLR, 2022, pp. 25 611–25 635.
- R. Yang, H. Xu, Y. Wu, and X. Wang, “Multi-task reinforcement learning with soft modularization,” Advances in Neural Information Processing Systems, vol. 33, pp. 4767–4777, 2020.
- X. Sun, R. Panda, R. Feris, and K. Saenko, “Adashare: Learning what to share for efficient deep multi-task learning,” Advances in Neural Information Processing Systems, vol. 33, pp. 8728–8740, 2020.
- P. Guo, C.-Y. Lee, and D. Ulbricht, “Learning to branch for multi-task learning,” in Proceedings of the 37th International Conference on Machine Learning. PMLR, 2020, pp. 3854–3863.
- T. Bram, G. Brunner, O. Richter, and R. Wattenhofer, “Attentive multi-task deep reinforcement learning,” in Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2019, Würzburg, Germany, September 16–20, 2019, Proceedings, Part III. Springer, 2020, pp. 134–149.
- G. Cheng, L. Dong, W. Cai, and C. Sun, “Multi-task reinforcement learning with attention-based mixture of experts,” IEEE Robotics and Automation Letters, 2023.
- S. Sodhani, A. Zhang, and J. Pineau, “Multi-task reinforcement learning with context-based representations,” in International Conference on Machine Learning. PMLR, 2021, pp. 9767–9779.
- B. Liu, X. Liu, X. Jin, P. Stone, and Q. Liu, “Conflict-averse gradient descent for multi-task learning,” Advances in Neural Information Processing Systems, vol. 34, pp. 18 878–18 890, 2021.
- H. Yu, W. Xu, and H. Zhang, “Towards safe reinforcement learning with a safety editor policy,” Advances in Neural Information Processing Systems, vol. 35, pp. 2608–2621, 2022.
- Y. Zhang, Q. Vuong, and K. Ross, “First order constrained optimization in policy space,” Advances in Neural Information Processing Systems, vol. 33, pp. 15 338–15 349, 2020.
- Z. Qin, Y. Chen, and C. Fan, “Density constrained reinforcement learning,” in International Conference on Machine Learning. PMLR, 2021, pp. 8682–8692.
- T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor,” in Proceedings of the 35th International Conference on Machine Learning. PMLR, 2018, pp. 1861–1870.
- T. Yu, D. Quillen, Z. He, R. Julian, K. Hausman, C. Finn, and S. Levine, “Meta-world: A benchmark and evaluation for multi-task and meta reinforcement learning,” in Proceedings of the Conference on Robot Learning. PMLR, 2020, pp. 1094–1100.
- A. Hallak, D. Di Castro, and S. Mannor, “Contextual markov decision processes,” arXiv preprint arXiv:1502.02259, 2015.
- G. Tesauro et al., “Temporal difference learning and td-gammon,” Communications of the ACM, vol. 38, no. 3, pp. 58–68, 1995.