Generalizable Task Representation Learning for Offline Meta-Reinforcement Learning with Data Limitations (2312.15909v1)
Abstract: Generalization and sample efficiency have been long-standing issues concerning reinforcement learning, and thus the field of Offline Meta-Reinforcement Learning~(OMRL) has gained increasing attention due to its potential of solving a wide range of problems with static and limited offline data. Existing OMRL methods often assume sufficient training tasks and data coverage to apply contrastive learning to extract task representations. However, such assumptions are not applicable in several real-world applications and thus undermine the generalization ability of the representations. In this paper, we consider OMRL with two types of data limitations: limited training tasks and limited behavior diversity and propose a novel algorithm called GENTLE for learning generalizable task representations in the face of data limitations. GENTLE employs Task Auto-Encoder~(TAE), which is an encoder-decoder architecture to extract the characteristics of the tasks. Unlike existing methods, TAE is optimized solely by reconstruction of the state transition and reward, which captures the generative structure of the task models and produces generalizable representations when training tasks are limited. To alleviate the effect of limited behavior diversity, we consistently construct pseudo-transitions to align the data distribution used to train TAE with the data distribution encountered during testing. Empirically, GENTLE significantly outperforms existing OMRL methods on both in-distribution tasks and out-of-distribution tasks across both the given-context protocol and the one-shot protocol.
- Stochastic Variational Video Prediction. In International Conference on Learning Representations (ICLR).
- Multi-Task Reinforcement Learning with Task Representation Method. In ICLR Workshop on Generalizable Policy Learning in Physical World.
- A Recurrent Latent Variable Model for Sequential Data. In Advances in Neural Information Processing Systems (NIPS), 2980–2988.
- Offline Meta Reinforcement Learning - Identifiability Challenges and Effective Data Collection Strategies. In Advances in Neural Information Processing Systems (NeurIPS), 4607–4618.
- RL^2: Fast Reinforcement Learning via Slow Reinforcement Learning. arXiv preprint arXiv:1611.02779.
- Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks. In International Conference on Machine Learning (ICML), 1126–1135.
- Towards Effective Context for Meta-Reinforcement Learning: An Approach Based on Contrastive Learning. In AAAI Conference on Artificial Intelligence (AAAI), 7457–7465.
- A Minimalist Approach to Offline Reinforcement Learning. In Advances in Neural Information Processing Systems (NeurIPS), 20132–20145.
- Off-Policy Deep Reinforcement Learning without Exploration. In International Conference on Machine Learning (ICML), 2052–2062.
- Addressing Function Approximation Error in Actor-Critic Methods. In International Conference on Machine Learning (ICML), 1582–1591.
- Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. In International Conference on Machine Learning (ICML), 1856–1865.
- A Survey of Zero-Shot Generalisation in Deep Reinforcement Learning. Journal of Artificial Intelligence Research, 76: 201–264.
- Offline Reinforcement Learning with Implicit Q-Learning. In International Conference on Learning Representations (ICLR).
- Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction. In Advances in Neural Information Processing Systems (NeurIPS), 11761–11771.
- Conservative Q-Learning for Offline Reinforcement Learning. In Advances in Neural Information Processing Systems (NeurIPS).
- Multi-Task Batch Reinforcement Learning with Metric Learning. In Advances in Neural Information Processing Systems (NeurIPS).
- FOCAL: Efficient Fully-Offline Meta-Reinforcement Learning via Distance Metric Learning and Behavior Regularization. In International Conference on Learning Representations (ICLR).
- Model-Based Offline Meta-Reinforcement Learning with Regularization. In International Conference on Learning Representations (ICLR).
- Adapt to Environment Sudden Changes by Learning a Context Sensitive Policy. In AAAI Conference on Artificial Intelligence (AAAI), 7637–7646.
- Meta-Reinforcement Learning Robust to Distributional Shift via Model Identification and Experience Relabeling. arXiv preprint arXiv:2006.07178.
- Offline Meta-Reinforcement Learning with Advantage Weighting. In International Conference on Machine Learning (ICML), 7780–7791.
- MetaDiffuser: Diffusion Model as Conditional Planner for Offline Meta-RL. In International Conference on Machine Learning (ICML), 26087–26105.
- Offline Meta-Reinforcement Learning with Online Self-Supervision. In International Conference on Machine Learning (ICML), 17811–17829.
- Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables. In International Conference on Machine Learning (ICML), 5331–5340.
- MuJoCo: A Physics Engine for Model-Based Control. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 5026–5033.
- Representation Learning with Contrastive Predictive Coding. arXiv preprint arXiv:1807.03748.
- Visualizing Data using t-SNE. Journal of Machine Learning Research, 9(86): 2579–2605.
- Behavior Regularized Offline Reinforcement Learning. arXiv preprint arXiv:1911.11361.
- Single Episode Policy Transfer in Reinforcement Learning. In International Conference on Learning Representations (ICLR).
- Robust Task Representations for Offline Meta-Reinforcement Learning via Contrastive Learning. In International Conference on Machine Learning (ICML), 25747–25759.
- MetaCURE: Meta Reinforcement Learning with Empowerment-Driven Exploration. In International Conference on Machine Learning (ICML), 12600–12610.
- VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning. In International Conference on Learning Representations (ICLR).