Mastering Memory Tasks with World Models (2403.04253v1)
Abstract: Current model-based reinforcement learning (MBRL) agents struggle with long-term dependencies. This limits their ability to effectively solve tasks involving extended time gaps between actions and outcomes, or tasks demanding the recalling of distant observations to inform current actions. To improve temporal coherence, we integrate a new family of state space models (SSMs) in world models of MBRL agents to present a new method, Recall to Imagine (R2I). This integration aims to enhance both long-term memory and long-horizon credit assignment. Through a diverse set of illustrative tasks, we systematically demonstrate that R2I not only establishes a new state-of-the-art for challenging memory and credit assignment RL tasks, such as BSuite and POPGym, but also showcases superhuman performance in the complex memory domain of Memory Maze. At the same time, it upholds comparable performance in classic RL tasks, such as Atari and DMC, suggesting the generality of our method. We also show that R2I is faster than the state-of-the-art MBRL method, DreamerV3, resulting in faster wall-time convergence.
- Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
- Mohammed Abbad. Perturbation and stability theory for Markov control problems. University of Maryland, Baltimore County, 1991.
- Deep reinforcement learning at the edge of the statistical precipice. Advances in Neural Information Processing Systems, 2021.
- Rudder: Return decomposition for delayed rewards. Advances in Neural Information Processing Systems, 32, 2019.
- Layer normalization, 2016.
- The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research, 47:253–279, jun 2013. doi: 10.1613/jair.3912. URL https://doi.org/10.1613%2Fjair.3912.
- Learning long-term dependencies with gradient descent is difficult. IEEE transactions on neural networks, 5(2):157–166, 1994.
- Guy E. Blelloch. Prefix sums and their applications. Technical Report CMU-CS-90-190, School of Computer Science, Carnegie Mellon University, November 1990.
- William L Brogan. Modern control theory. Pearson education india, 1991.
- Language models are few-shot learners, 2020.
- Recurrent memory transformer, 2022.
- Transdreamer: Reinforcement learning with transformer world models, 2022.
- Learning phrase representations using rnn encoder-decoder for statistical machine translation, 2014.
- Rethinking attention with performers, 2022.
- Transformer-xl: Attentive language models beyond a fixed-length context, 2019.
- Flashattention: Fast and memory-efficient exact attention with io-awareness, 2022.
- Language modeling with gated convolutional networks. In Doina Precup and Yee Whye Teh (eds.), Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pp. 933–941. PMLR, 06–11 Aug 2017. URL https://proceedings.mlr.press/v70/dauphin17a.html.
- Decision s4: Efficient sequence-based RL via state spaces layers. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=kqHkCVS7wbj.
- Facing off world model backbones: Rnns, transformers, and s4, 2023.
- Longnet: Scaling transformers to 1,000,000,000 tokens, 2023.
- Impala: Scalable distributed deep-rl with importance weighted actor-learner architectures, 2018.
- Deep transformer q-networks for partially observable reinforcement learning. arXiv preprint arXiv:2206.01078, 2022.
- Block-state transformer, 2023.
- Hungry hungry hippos: Towards language modeling with state space models, 2023.
- Hippo: Recurrent memory with optimal polynomial projections. Advances in neural information processing systems, 33:1474–1487, 2020.
- Efficiently modeling long sequences with structured state spaces. arXiv preprint arXiv:2111.00396, 2021a.
- Combining recurrent, convolutional, and continuous-time models with linear state-space layers. Advances in Neural Information Processing Systems, 34, 2021b.
- On the parameterization and initialization of diagonal state space models, 2022.
- Diagonal state spaces are as effective as structured state spaces, 2022a.
- Simplifying and understanding state space models with diagonal linear rnns, 2022b.
- World models. 2018. doi: 10.5281/ZENODO.1207631. URL https://zenodo.org/record/1207631.
- Dream to control: Learning behaviors by latent imagination. arXiv preprint arXiv:1912.01603, 2019a.
- Learning latent dynamics for planning from pixels. In International Conference on Machine Learning, pp. 2555–2565, 2019b.
- Mastering atari with discrete world models. arXiv preprint arXiv:2010.02193, 2020.
- Mastering diverse domains through world models. arXiv preprint arXiv:2301.04104, 2023.
- Gaussian error linear units (gelus), 2023.
- beta-VAE: Learning basic visual concepts with a constrained variational framework. In International Conference on Learning Representations, 2017. URL https://openreview.net/forum?id=Sy2fzU9gl.
- Optimizing agent behavior over long time scales by transporting value. Nature communications, 10(1):5223, 2019.
- Reinforcement learning with misspecified model classes. In 2013 IEEE International Conference on Robotics and Automation, pp. 939–946. IEEE, 2013.
- Uncertainty-driven imagination for continuous deep reinforcement learning. In Sergey Levine, Vincent Vanhoucke, and Ken Goldberg (eds.), Proceedings of the 1st Annual Conference on Robot Learning, volume 78 of Proceedings of Machine Learning Research, pp. 195–206. PMLR, 13–15 Nov 2017. URL https://proceedings.mlr.press/v78/kalweit17a.html.
- Recurrent experience replay in distributed reinforcement learning. In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=r1lyTjAqYX.
- Learning multiple layers of features from tiny images. 2009.
- Objective mismatch in model-based reinforcement learning. arXiv preprint arXiv:2002.04523, 2020.
- Structured state space models for in-context reinforcement learning. Advances in Neural Information Processing Systems, 36, 2024.
- Mega: Moving average equipped gated attention, 2023.
- Long range language modeling via gated state spaces. arXiv preprint arXiv:2206.13947, 2022.
- Pointer sentinel mixture models, 2016.
- Transformers are sample-efficient world models, 2023.
- Model-based reinforcement learning: A survey, 2022.
- POPGym: Benchmarking partially observable reinforcement learning. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=chDrutUTs0K.
- S4nd: Modeling images and videos as multidimensional signals with state spaces. Advances in neural information processing systems, 35:2846–2861, 2022.
- When do transformers shine in rl? decoupling memory from credit assignment, 2023.
- Control-oriented model-based reinforcement learning with implicit differentiation, 2021.
- Behaviour suite for reinforcement learning. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=rygf-kSYwH.
- Stabilizing transformers for reinforcement learning. In International conference on machine learning, pp. 7487–7498. PMLR, 2020.
- On the difficulty of training recurrent neural networks, 2013.
- Evaluating long-term memory in 3d mazes. arXiv preprint arXiv:2210.13383, 2022.
- Synthetic returns for long-term credit assignment. arXiv preprint arXiv:2102.12425, 2021.
- Transformer-based world models are happy with 100k interactions. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=TdBaDGCpjly.
- The annotated s4. In ICLR Blog Track, 2022. URL https://iclr-blog-track.github.io/2022/03/25/annotated-s4/. https://iclr-blog-track.github.io/2022/03/25/annotated-s4/.
- Mastering atari, go, chess and shogi by planning with a learned model. Nature, 588(7839):604–609, dec 2020. doi: 10.1038/s41586-020-03051-4. URL https://doi.org/10.1038%2Fs41586-020-03051-4.
- Proximal policy optimization algorithms, 2017.
- Simplified state space layers for sequence modeling, 2023.
- Richard S Sutton. Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In Machine learning proceedings 1990, pp. 216–224. Elsevier, 1990.
- Deepmind control suite, 2018.
- Long range arena: A benchmark for efficient transformers, 2020.
- Long range arena : A benchmark for efficient transformers. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=qVyeW-grC2k.
- Lamda: Language models for dialog applications, 2022.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- Pretraining without attention, 2023.
- Learning deep transformer models for machine translation. In Anna Korhonen, David Traum, and Lluís Màrquez (eds.), Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 1810–1822, Florence, Italy, July 2019. Association for Computational Linguistics. doi: 10.18653/v1/P19-1176. URL https://aclanthology.org/P19-1176.
- Pete Warden. Speech commands: A dataset for limited-vocabulary speech recognition, 2018.
- Modern hopfield networks for return decomposition for delayed rewards. In Deep RL Workshop NeurIPS 2021, 2021.
- Ronald J. Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn., 8(3–4):229–256, may 1992. ISSN 0885-6125. doi: 10.1007/BF00992696. URL https://doi.org/10.1007/BF00992696.
- Mastering atari games with limited data, 2021.
- Big bird: Transformers for longer sequences, 2021.
- Opt: Open pre-trained transformer language models, 2022.
- Efficient long sequence modeling via state space augmented transformer. arXiv preprint arXiv:2212.08136, 2022.
- Model based reinforcement learning for atari. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=S1xCPJHtDB.