Value Gradient weighted Model-Based Reinforcement Learning (2204.01464v2)
Abstract: Model-based reinforcement learning (MBRL) is a sample efficient technique to obtain control policies, yet unavoidable modeling errors often lead performance deterioration. The model in MBRL is often solely fitted to reconstruct dynamics, state observations in particular, while the impact of model error on the policy is not captured by the training objective. This leads to a mismatch between the intended goal of MBRL, enabling good policy and value learning, and the target of the loss function employed in practice, future state prediction. Naive intuition would suggest that value-aware model learning would fix this problem and, indeed, several solutions to this objective mismatch problem have been proposed based on theoretical analysis. However, they tend to be inferior in practice to commonly used maximum likelihood (MLE) based approaches. In this paper we propose the Value-gradient weighted Model Learning (VaGraM), a novel method for value-aware model learning which improves the performance of MBRL in challenging settings, such as small model capacity and the presence of distracting state dimensions. We analyze both MLE and value-aware approaches and demonstrate how they fail to account for exploration and the behavior of function approximation when learning value-aware models and highlight the additional goals that must be met to stabilize optimization in the deep learning setting. We verify our analysis by showing that our loss function is able to achieve high returns on the Mujoco benchmark suite while being more robust than maximum likelihood based approaches.
- Policy-aware model learning for policy gradient methods. ArXiv, abs/2003.00030, 2020.
- Equivalence between wasserstein and value-aware loss for model-based reinforcement learning. ArXiv, abs/1806.01265, 2018.
- Model-based reinforcement learning with value-targeted regression. In International Conference on Machine Learning, 2020.
- Towards deeper deep reinforcement learning. arXiv preprint arXiv:2106.01151, 2021.
- Is high variance unavoidable in RL? a case study in continuous control. In International Conference on Learning Representations, 2022.
- Openai gym. ArXiv, abs/1606.01540, 2016.
- Control-aware representations for model-based reinforcement learning. In International Conference on Learning Representations, 2021.
- Pilco: A model-based and data-efficient approach to policy search. In International Conference on Machine Learning, 2011.
- Gradient-aware model-based policy search. In AAAI Conference on Artificial Intelligence, 2020.
- Tree-based batch mode reinforcement learning. Journal of Machine Learning Research, 6:503–556, 2005.
- Amir-massoud Farahmand. Iterative value-aware model learning. In Advances in Neural Information Processing Systems, volume 31, 2018.
- Error propagation for approximate policy and value iteration. Advances in Neural Information Processing Systems, 2010.
- Value-Aware Loss Function for Model-based Reinforcement Learning. In International Conference on Artificial Intelligence and Statistics, 2017.
- Metrics for finite markov decision processes. In Uncertainty in AI, 2004.
- Bisimulation metrics for continuous markov decision processes. SIAM Journal on Computing, 40(6), 2011.
- Geoffrey J Gordon. Stable function approximation in dynamic programming. In International Conference on Machine Learning, 1995.
- The value equivalence principle for model-based reinforcement learning. In Advances in Neural Information Processing Systems, 2020.
- Proper value equivalence. In Advances in Neural Information Processing Systems, 2021.
- Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International Conference on Machine Learning, 2018.
- Dream to control: Learning behaviors by latent imagination. In International Conference on Learning Representations, 2020.
- When to trust your model: Model-based policy optimization. In Advances in Neural Information Processing Systems, 2019.
- Reinforcement learning with misspecified model classes. In IEEE International Conference on Robotics and Automation, 2013.
- Near-optimal reinforcement learning in polynomial time. Machine learning, 49(2):209–232, 2002.
- Objective mismatch in model-based reinforcement learning. In Conference on Learning for Dynamics and Control, 2020.
- Prediction, consistency, curvature: Representation learning for locally-linear control. In International Conference on Learning Representations, 2020.
- Guided policy search. In International Conference on Machine Learning, 2013.
- Decision-aware model learning for actor-critic methods: When theory does not meet practice. In ”I Can’t Believe It’s Not Better!” at NeurIPS Workshops, 2020.
- Algorithmic framework for model-based deep reinforcement learning with theoretical guarantees. In International Conference on Learning Representations, 2019.
- Learning dynamics models for model predictive agents. ArXiv, abs/2109.14311, 2021.
- Playing atari with deep reinforcement learning. In NeurIPS Deep Learning Workshop. 2013.
- Model-based reinforcement learning: A survey. ArXiv, abs/2006.16712, 2020.
- Goal-aware prediction: Learning to model what matters. In International Conference on Machine Learning, 2020.
- Control-oriented model-based reinforcement learning with implicit differentiation. In AAAI Conference on Artificial Intelligence, 2022.
- Value prediction network. In Advances in Neural Information Processing Systems, 2017.
- Pytorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems 32. 2019.
- Natural actor-critic. Neurocomputing, 71(7-9):1180–1190, 2008.
- Mbrl-lib: A modular library for model-based reinforcement learning. ArXiv, abs/2104.10159, 2021.
- Martin L. Puterman. Markov decision processes: Discrete stochastic dynamic programming. In Wiley Series in Probability and Statistics, 1994.
- Agnostic system identification for model-based reinforcement learning. In International Conference on Machine Learning, 2012.
- Jeff G Schneider. Exploiting model uncertainty estimates for safe dynamic control learning. In Advances in Neural Information Processing Systems, 1997.
- Mastering atari, go, chess and shogi by planning with a learned model. Nature, 588(7839):604–609, 2020.
- The distracting control suite - a challenging benchmark for reinforcement learning from pixels. ArXiv, abs/2101.02722, 2021.
- Richard S Sutton. Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In Machine learning Proceedings, pp. 216–224. 1990.
- Erin Talvitie. Self-correcting models for model-based reinforcement learning. In AAAI Conference on Artificial Intelligence, 2017.
- Dueling network architectures for deep reinforcement learning. In International Conference on Machine Learning, 2016.
- Q-learning. Machine learning, 8(3-4), 1992.
- Embed to control: A locally linear latent dynamics model for control from raw images. In Advances in Neural Information Processing Systems, 2015.
- Learning invariant representations for reinforcement learning without reconstruction. In International Conference on Learning Representations, 2021.
- Is model ensemble necessary? model-based RL via a single model with lipschitz regularized value function. In The Eleventh International Conference on Learning Representations, 2023.
- Claas Voelcker (8 papers)
- Victor Liao (2 papers)
- Animesh Garg (129 papers)
- Amir-massoud Farahmand (31 papers)