Decoupling Meta-Reinforcement Learning with Gaussian Task Contexts and Skills (2312.06518v1)
Abstract: Offline meta-reinforcement learning (meta-RL) methods, which adapt to unseen target tasks with prior experience, are essential in robot control tasks. Current methods typically utilize task contexts and skills as prior experience, where task contexts are related to the information within each task and skills represent a set of temporally extended actions for solving subtasks. However, these methods still suffer from limited performance when adapting to unseen target tasks, mainly because the learned prior experience lacks generalization, i.e., they are unable to extract effective prior experience from meta-training tasks by exploration and learning of continuous latent spaces. We propose a framework called decoupled meta-reinforcement learning (DCMRL), which (1) contrastively restricts the learning of task contexts through pulling in similar task contexts within the same task and pushing away different task contexts of different tasks, and (2) utilizes a Gaussian quantization variational autoencoder (GQ-VAE) for clustering the Gaussian distributions of the task contexts and skills respectively, and decoupling the exploration and learning processes of their spaces. These cluster centers which serve as representative and discrete distributions of task context and skill are stored in task context codebook and skill codebook, respectively. DCMRL can acquire generalizable prior experience and achieve effective adaptation to unseen target tasks during the meta-testing phase. Experiments in the navigation and robot manipulation continuous control tasks show that DCMRL is more effective than previous meta-RL methods with more generalizable prior experience.
- OPAL: Offline Primitive Discovery for Accelerating Offline Reinforcement Learning. In ICLR.
- Explore, Discover and Learn: Unsupervised Discovery of State-Covering Skills. In ICML, 1317–1327.
- Actionable Models: Unsupervised Offline Reinforcement Learning of Robotic Skills. In ICML, 1518–1528.
- Offline Meta Reinforcement Learning - Identifiability Challenges and Effective Data Collection Strategies. In NeurIPS, 4607–4618.
- RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning. CoRR, abs/1611.02779.
- Meta-Q-Learning. In ICLR.
- Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks. In ICML, 1126–1135.
- D4RL: Datasets for Deep Data-Driven Reinforcement Learning. CoRR, abs/2004.07219.
- Relay Policy Learning: Solving Long-Horizon Tasks via Imitation and Reinforcement Learning. In CoRL, 1025–1037.
- Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. In ICML, 1856–1865.
- Deep Recurrent Q-Learning for Partially Observable MDPs. In AAAI, 29–37.
- Learning an Embedding Space for Transferable Robot Skills. In ICLR.
- Memory-based control with recurrent neural networks. CoRR, abs/1512.04455.
- beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework. In ICLR.
- Meta reinforcement learning as task inference. CoRR, abs/1905.06424.
- Adam: A Method for Stochastic Optimization. In ICLR.
- Context-aware Dynamics Model for Generalization in Model-Based Reinforcement Learning. In ICML, 5757–5766.
- Composing Complex Skills by Learning Transition Policies. In ICLR.
- Decoupling Exploration and Exploitation for Meta-Reinforcement Learning without Sacrifices. In ICML, 6925–6935.
- Learning Latent Plans from Play. In CoRL, 1113–1132.
- Catch & Carry: reusable neural controllers for vision-guided whole-body tasks. 39.
- A Simple Neural Attentive Meta-Learner. In ICLR.
- Offline Meta-Reinforcement Learning with Advantage Weighting. In ICML, 7780–7791.
- Skill-based Meta-Reinforcement Learning. In ICLR.
- Accelerating Reinforcement Learning with Learned Skill Priors. In CoRL, 188–204.
- Demonstration-Guided Reinforcement Learning with Learned Skills. In CoRL, 729–739.
- Offline Meta-Reinforcement Learning with Online Self-Supervision. In ICML, 17811–17829.
- Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables. In ICML, 5331–5340.
- ProMP: Proximal Meta-Policy Search. In ICLR.
- FaceNet: A unified embedding for face recognition and clustering. In CVPR, 815–823.
- Trajectory-wise Multiple Choice Learning for Dynamics Generalization in Reinforcement Learning. In NeurIPS.
- Keep Doing What Worked: Behavior Modelling Priors for Offline Reinforcement Learning. In ICLR.
- Distral: Robust multitask reinforcement learning. In NeurIPS, 4496–4506.
- Learning to reinforcement learn. In CogSci.
- NoRML: No-Reward Meta Learning. In AAMAS, 323–331.
- One-Shot Imitation from Observing Humans via Domain-Adaptive Meta-Learning. In ICLR (Workshop).
- Robust Task Representations for Offline Meta-Reinforcement Learning via Contrastive Learning. In ICML, 25747–25759.
- Fast Context Adaptation via Meta-Learning. In ICML, 7693–7702.