On the Feasibility of Cross-Task Transfer with Model-Based Reinforcement Learning (2210.10763v2)
Abstract: Reinforcement Learning (RL) algorithms can solve challenging control problems directly from image observations, but they often require millions of environment interactions to do so. Recently, model-based RL algorithms have greatly improved sample-efficiency by concurrently learning an internal model of the world, and supplementing real environment interactions with imagined rollouts for policy improvement. However, learning an effective model of the world from scratch is challenging, and in stark contrast to humans that rely heavily on world understanding and visual cues for learning new skills. In this work, we investigate whether internal models learned by modern model-based RL algorithms can be leveraged to solve new, distinctly different tasks faster. We propose Model-Based Cross-Task Transfer (XTRA), a framework for sample-efficient online RL with scalable pretraining and finetuning of learned world models. By offline multi-task pretraining and online cross-task finetuning, we achieve substantial improvements over a baseline trained from scratch; we improve mean performance of model-based algorithm EfficientZero by 23%, and by as much as 71% in some instances.
- Bruce D. Abramson. The Expected-Outcome Model of Two-Player Games. PhD thesis, Columbia University, 1987. AAI8827528.
- Flamingo: a visual language model for few-shot learning. arXiv preprint arXiv:2204.14198, 2022.
- Agent57: Outperforming the atari human benchmark. ArXiv, abs/2003.13350, 2020.
- Video pretraining (vpt): Learning to act by watching unlabeled online videos. ArXiv, abs/2206.11795, 2022.
- The arcade learning environment: An evaluation platform for general agents (extended abstract). In IJCAI, 2013.
- Richard Bellman. A markovian decision process. Journal of Mathematics and Mechanics, 6(5):679–684, 1957.
- Dota 2 with large scale deep reinforcement learning. ArXiv, abs/1912.06680, 2019.
- A geometric perspective on self-supervised policy adaptation. ArXiv, abs/2011.07318, 2020.
- Language models are few-shot learners. ArXiv, abs/2005.14165, 2020.
- Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311, 2022.
- Leveraging procedural generation to benchmark reinforcement learning. ArXiv, abs/1912.01588, 2020.
- Rémi Coulom. Efficient selectivity and backup operators in monte-carlo tree search. In International conference on computers and games, pp. 72–83. Springer, 2006.
- Bert: Pre-training of deep bidirectional transformers for language understanding. ArXiv, abs/1810.04805, 2019.
- Unsupervised visual representation learning by context prediction. 2015 IEEE International Conference on Computer Vision (ICCV), pp. 1422–1430, 2015.
- Rl2: Fast reinforcement learning via slow reinforcement learning. ArXiv, abs/1611.02779, 2016.
- Investigating human priors for playing video games. In ICML, 2018.
- Deep visual foresight for planning robot motion. 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 2786–2793, 2017.
- Rich feature hierarchies for accurate object detection and semantic segmentation. 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587, 2014.
- Recurrent world models facilitate policy evolution. Advances in neural information processing systems, 31, 2018.
- Dream to control: Learning behaviors by latent imagination. arXiv preprint arXiv:1912.01603, 2019a.
- Learning latent dynamics for planning from pixels. ArXiv, abs/1811.04551, 2019b.
- Mastering atari with discrete world models. ArXiv, abs/2010.02193, 2021.
- Self-supervised policy adaptation during deployment. In International Conference on Learning Representations (ICLR), 2021a.
- Stabilizing deep q-learning with convnets and vision transformers under data augmentation. In NeurIPS, 2021b.
- Temporal difference learning for model predictive control. In ICML, 2022a.
- On pre-training for visuo-motor control: Revisiting a learning-from-scratch baseline, 2022b.
- Modem: Accelerating visual model-based reinforcement learning with demonstrations. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=JdTnc9gjVfJ.
- Momentum contrast for unsupervised visual representation learning. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9726–9735, 2020.
- Efficient adaptation for end-to-end vision-based robotic manipulation. ArXiv, abs/2004.10190, 2020.
- Planning and acting in partially observable stochastic domains. Artificial Intelligence, 1998.
- Model-based reinforcement learning for atari. ArXiv, abs/1903.00374, 2020.
- Image augmentation is all you need: Regularizing deep reinforcement learning from pixels. ArXiv, abs/2004.13649, 2021.
- Unsupervised learning of object keypoints for perception and control. ArXiv, abs/1906.11883, 2019.
- Conservative q-learning for offline reinforcement learning. ArXiv, abs/2006.04779, 2020.
- Reinforcement learning with augmented data. ArXiv, abs/2004.14990, 2020.
- Multi-game decision transformers. ArXiv, abs/2205.15241, 2022.
- Elevater: A benchmark and toolkit for evaluating language-augmented visual models. arXiv preprint arXiv:2204.08790, 2022.
- Meta-learning with temporal convolutions. ArXiv, abs/1707.03141, 2017.
- Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602, 2013.
- Visual reinforcement learning with imagined goals. In NeurIPS, 2018.
- A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22:1345–1359, 2010.
- The unsurprising effectiveness of pre-trained vision models for control. In ICML, 2022.
- Actor-mimic: Deep multitask and transfer reinforcement learning. In ICLR (Poster), 2016.
- Learning transferable visual models from natural language supervision. In International Conference on Machine Learning, pp. 8748–8763, 2021.
- A generalist agent. ArXiv, abs/2205.06175, 2022.
- Policy distillation. In ICLR (Poster), 2016.
- Mastering atari, go, chess and shogi by planning with a learned model. Nature, 2020.
- Online and offline reinforcement learning by planning with a learned model. In A. Beygelzimer, Y. Dauphin, P. Liang, and J. Wortman Vaughan (eds.), Advances in Neural Information Processing Systems, 2021. URL https://openreview.net/forum?id=HKtsGW-lNbw.
- Data-efficient reinforcement learning with self-predictive representations. In ICLR, 2021.
- Reinforcement learning with action-free pre-training from videos. In ICML, 2022.
- Rrl: Resnet as representation for reinforcement learning. ArXiv, abs/2107.03380, 2021.
- Mastering the game of go with deep neural networks and tree search. Nature, 529:484–489, 01 2016. doi: 10.1038/nature16961.
- Curl: Contrastive unsupervised representations for reinforcement learning. In ICML, 2020.
- Representation learning with contrastive predictive coding. ArXiv, abs/1807.03748, 2018.
- Don’t start from scratch: Leveraging prior data to automate robotic reinforcement learning. ArXiv, abs/2207.04703, 2022.
- Vrl3: A data-driven framework for visual deep reinforcement learning. ArXiv, abs/2202.10324, 2022.
- Masked visual pre-training for motor control. ArXiv, abs/2203.06173, 2022.
- Lifelong robotic reinforcement learning by retaining experiences. ArXiv, abs/2109.09180, 2021.
- Improving sample efficiency in model-free reinforcement learning from images. arXiv, 2019.
- Mastering atari games with limited data. In NeurIPS, 2021.
- Experience-embedded visual foresight. In Conference on Robot Learning, 2019.
- Gradient surgery for multi-task learning. Advances in Neural Information Processing Systems, 33:5824–5836, 2020.
- Visual reinforcement learning with self-supervised 3d representations. arXiv preprint arXiv:2210.07241, 2022.
- A framework for efficient robotic manipulation. ArXiv, abs/2012.07975, 2020.
- Online decision transformer. In ICML, 2022.