Locally Constrained Representations in Reinforcement Learning (2209.09441v2)
Abstract: The success of Reinforcement Learning (RL) heavily relies on the ability to learn robust representations from the observations of the environment. In most cases, the representations learned purely by the reinforcement learning loss can differ vastly across states depending on how the value functions change. However, the representations learned need not be very specific to the task at hand. Relying only on the RL objective may yield representations that vary greatly across successive time steps. In addition, since the RL loss has a changing target, the representations learned would depend on how good the current values/policies are. Thus, disentangling the representations from the main task would allow them to focus not only on the task-specific features but also the environment dynamics. To this end, we propose locally constrained representations, where an auxiliary loss forces the state representations to be predictable by the representations of the neighboring states. This encourages the representations to be driven not only by the value/policy learning but also by an additional loss that constrains the representations from over-fitting to the value loss. We evaluate the proposed method on several known benchmarks and observe strong performance. Especially in continuous control tasks, our experiments show a significant performance improvement.
- The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research, 47:253–279, 2013.
- Openai gym. arXiv preprint arXiv:1606.01540, 2016.
- A simple framework for contrastive learning of visual representations. CoRR, abs/2002.05709, 2020. URL https://arxiv.org/abs/2002.05709.
- Minimalistic gridworld environment for openai gym. https://github.com/maximecb/gym-minigrid, 2018.
- Two-timescale networks for nonlinear value function approximation. In International Conference on Learning Representations, 2018.
- Noisy networks for exploration. CoRR, abs/1706.10295, 2017. URL http://arxiv.org/abs/1706.10295.
- Deepmdp: Learning continuous latent space models for representation learning. CoRR, abs/1906.02736, 2019. URL http://arxiv.org/abs/1906.02736.
- Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. CoRR, abs/1801.01290, 2018. URL http://arxiv.org/abs/1801.01290.
- Rainbow: Combining improvements in deep reinforcement learning. CoRR, abs/1710.02298, 2017. URL http://arxiv.org/abs/1710.02298.
- Reducing the dimensionality of data with neural networks. Science, 313(5786):504–507, 2006. doi: 10.1126/science.1127647. URL https://www.science.org/doi/abs/10.1126/science.1127647.
- Reinforcement learning with unsupervised auxiliary tasks. CoRR, abs/1611.05397, 2016. URL http://arxiv.org/abs/1611.05397.
- Learning state representations with robotic priors. Autonomous Robots, 39:407–428, 10 2015. doi: 10.1007/s10514-015-9459-7.
- Pves: Position-velocity encoders for unsupervised learning of structured state representations. CoRR, abs/1705.09805, 2017. URL http://arxiv.org/abs/1705.09805.
- Incremental slow feature analysis: Adaptive and episodic learning from high-dimensional input streams. CoRR, abs/1112.2113, 2011. URL http://arxiv.org/abs/1112.2113.
- Building machines that learn and think like people. Behavioral and Brain Sciences, 40:e253, 2017. doi: 10.1017/S0140525X16001837.
- Reinforcement learning with augmented data. CoRR, abs/2004.14990, 2020. URL https://arxiv.org/abs/2004.14990.
- Predictive information accelerates learning in RL. CoRR, abs/2007.12401, 2020. URL https://arxiv.org/abs/2007.12401.
- Lin, L. Self-improving reactive agents based on reinforcement learning, planning and teaching. Machine Learning, 8:293–321, 2004.
- Learn to swing up and balance a real pole based on raw visual input data. In Huang, T., Zeng, Z., Li, C., and Leung, C. S. (eds.), Neural Information Processing, pp. 126–133, Berlin, Heidelberg, 2012. Springer Berlin Heidelberg. ISBN 978-3-642-34500-5.
- Human-level control through deep reinforcement learning. Nature, 518(7540):529–533, February 2015. ISSN 00280836. URL http://dx.doi.org/10.1038/nature14236.
- Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500):2323–2326, 2000. doi: 10.1126/science.290.5500.2323. URL https://www.science.org/doi/abs/10.1126/science.290.5500.2323.
- Data-efficient reinforcement learning with self-predictive representations. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=uCQfPZwRaUu.
- Time-contrastive networks: Self-supervised learning from multi-view observation. CoRR, abs/1704.06888, 2017. URL http://arxiv.org/abs/1704.06888.
- Mastering the game of go with deep neural networks and tree search. Nature, 529(7587):484–489, Jan 2016. ISSN 1476-4687. doi: 10.1038/nature16961. URL https://doi.org/10.1038/nature16961.
- A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science, 362(6419):1140–1144, 2018. doi: 10.1126/science.aar6404. URL https://www.science.org/doi/abs/10.1126/science.aar6404.
- CURL: contrastive unsupervised representations for reinforcement learning. CoRR, abs/2004.04136, 2020. URL https://arxiv.org/abs/2004.04136.
- Decoupling representation learning from reinforcement learning. CoRR, abs/2009.08319, 2020. URL https://arxiv.org/abs/2009.08319.
- Horde: A scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction. In The 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 2, AAMAS ’11, pp. 761–768, Richland, SC, 2011. International Foundation for Autonomous Agents and Multiagent Systems. ISBN 0982657161.
- Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033, 2012. doi: 10.1109/IROS.2012.6386109.
- Representation learning with contrastive predictive coding. CoRR, abs/1807.03748, 2018. URL http://arxiv.org/abs/1807.03748.
- Visualizing data using t-SNE. Journal of Machine Learning Research, 9:2579–2605, 2008. URL http://www.jmlr.org/papers/v9/vandermaaten08a.html.
- Stable reinforcement learning with autoencoders for tactile and visual data. In 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 3928–3934, 2016. doi: 10.1109/IROS.2016.7759578.
- Embed to control: A locally linear latent dynamics model for control from raw images. In Cortes, C., Lawrence, N., Lee, D., Sugiyama, M., and Garnett, R. (eds.), Advances in Neural Information Processing Systems, volume 28. Curran Associates, Inc., 2015. URL https://proceedings.neurips.cc/paper/2015/file/a1afc58c6ca9540d057299ec3016d726-Paper.pdf.
- Reinforcement learning with prototypical representations. CoRR, abs/2102.11271, 2021a. URL https://arxiv.org/abs/2102.11271.
- Image augmentation is all you need: Regularizing deep reinforcement learning from pixels. In International Conference on Learning Representations, 2021b. URL https://openreview.net/forum?id=GY6-6sTvGaf.
- robosuite: A modular simulation framework and benchmark for robot learning. In arXiv preprint arXiv:2009.12293, 2020.