Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Goal Exploration Augmentation via Pre-trained Skills for Sparse-Reward Long-Horizon Goal-Conditioned Reinforcement Learning (2210.16058v2)

Published 28 Oct 2022 in cs.LG, cs.AI, and cs.RO

Abstract: Reinforcement learning (RL) often struggles to accomplish a sparse-reward long-horizon task in a complex environment. Goal-conditioned reinforcement learning (GCRL) has been employed to tackle this difficult problem via a curriculum of easy-to-reach sub-goals. In GCRL, exploring novel sub-goals is essential for the agent to ultimately find the pathway to the desired goal. How to explore novel sub-goals efficiently is one of the most challenging issues in GCRL. Several goal exploration methods have been proposed to address this issue but still struggle to find the desired goals efficiently. In this paper, we propose a novel learning objective by optimizing the entropy of both achieved and new goals to be explored for more efficient goal exploration in sub-goal selection based GCRL. To optimize this objective, we first explore and exploit the frequently occurring goal-transition patterns mined in the environments similar to the current task to compose skills via skill learning. Then, the pretrained skills are applied in goal exploration. Evaluation on a variety of spare-reward long-horizon benchmark tasks suggests that incorporating our method into several state-of-the-art GCRL baselines significantly boosts their exploration efficiency while improving or maintaining their performance. The source code is available at: https://github.com/GEAPS/GEAPS.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (35)
  1. Jumper, J., Evans, R., Pritzel, A., Green, T., Figurnov, M., Ronneberger, O., Tunyasuvunakool, K., Bates, R., Žídek, A., Potapenko, A., et al.: Highly accurate protein structure prediction with alphafold. Nature 596(7873), 583–589 (2021) [3] Levine, S., Finn, C., Darrell, T., Abbeel, P.: End-to-end training of deep visuomotor policies. Journal of Machine Learning Research 17(1), 1334–1373 (2016) [4] Silver, D., Singh, S., Precup, D., Sutton, R.S.: Reward is enough. Artificial Intelligence 299, 103535 (2021) [5] Florensa, C., Held, D., Geng, X., Abbeel, P.: Automatic goal generation for reinforcement learning agents. In: International Conference on Machine Learning, pp. 1515–1528 (2018). PMLR [6] Pong, V.H., Dalal, M., Lin, S., Nair, A., Bahl, S., Levine, S.: Skew-fit: State-covering self-supervised reinforcement learning. In: International Conference on Machine Learning (2020). PMLR [7] Pitis, S., Chan, H., Zhao, S., Stadie, B., Ba, J.: Maximum entropy gain exploration for long horizon multi-goal reinforcement learning. In: International Conference on Machine Learning, pp. 7750–7761 (2020). PMLR [8] Mendonca, R., Rybkin, O., Daniilidis, K., Hafner, D., Pathak, D.: Discovering and achieving goals with world models. In: ICML 2021 Workshop on Unsupervised Reinforcement Learning (2021) [9] Liu, M., Zhu, M.Z., Zhang, W.: Goal-conditioned reinforcement learning: Problems and solutions. In: International Joint Conference on Artificial Intelligence (IJCAI-22), pp. 5502–5511 (2022) [10] Florensa, C., Duan, Y., Abbeel, P.: Stochastic neural networks for hierarchical reinforcement learning. arXiv preprint arXiv:1704.03012 (2017) [11] Eysenbach, B., Gupta, A., Ibarz, J., Levine, S.: Diversity is all you need: Learning skills without a reward function. In: International Conference on Learning Representations (2019) [12] Campero, A., Raileanu, R., Küttler, H., Tenenbaum, J.B., Rocktäschel, T., Grefenstette, E.: Learning with AMIGo: Adversarially motivated intrinsic goals. arXiv preprint arXiv:2006.12122 (2020) [13] Hoang, C., Sohn, S., Choi, J., Carvalho, W., Lee, H.: Successor feature landmarks for long-horizon goal-conditioned reinforcement learning. Advances in Neural Information Processing Systems 34 (2021) [14] Hartikainen, K., Geng, X., Haarnoja, T., Levine, S.: Dynamical distance learning for semi-supervised and unsupervised skill discovery. In: International Conference on Learning Representations (2019) [15] Sharma, A., Gu, S., Levine, S., Kumar, V., Hausman, K.: Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657 (2019) [16] Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Levine, S., Finn, C., Darrell, T., Abbeel, P.: End-to-end training of deep visuomotor policies. Journal of Machine Learning Research 17(1), 1334–1373 (2016) [4] Silver, D., Singh, S., Precup, D., Sutton, R.S.: Reward is enough. Artificial Intelligence 299, 103535 (2021) [5] Florensa, C., Held, D., Geng, X., Abbeel, P.: Automatic goal generation for reinforcement learning agents. In: International Conference on Machine Learning, pp. 1515–1528 (2018). PMLR [6] Pong, V.H., Dalal, M., Lin, S., Nair, A., Bahl, S., Levine, S.: Skew-fit: State-covering self-supervised reinforcement learning. In: International Conference on Machine Learning (2020). PMLR [7] Pitis, S., Chan, H., Zhao, S., Stadie, B., Ba, J.: Maximum entropy gain exploration for long horizon multi-goal reinforcement learning. In: International Conference on Machine Learning, pp. 7750–7761 (2020). PMLR [8] Mendonca, R., Rybkin, O., Daniilidis, K., Hafner, D., Pathak, D.: Discovering and achieving goals with world models. In: ICML 2021 Workshop on Unsupervised Reinforcement Learning (2021) [9] Liu, M., Zhu, M.Z., Zhang, W.: Goal-conditioned reinforcement learning: Problems and solutions. In: International Joint Conference on Artificial Intelligence (IJCAI-22), pp. 5502–5511 (2022) [10] Florensa, C., Duan, Y., Abbeel, P.: Stochastic neural networks for hierarchical reinforcement learning. arXiv preprint arXiv:1704.03012 (2017) [11] Eysenbach, B., Gupta, A., Ibarz, J., Levine, S.: Diversity is all you need: Learning skills without a reward function. In: International Conference on Learning Representations (2019) [12] Campero, A., Raileanu, R., Küttler, H., Tenenbaum, J.B., Rocktäschel, T., Grefenstette, E.: Learning with AMIGo: Adversarially motivated intrinsic goals. arXiv preprint arXiv:2006.12122 (2020) [13] Hoang, C., Sohn, S., Choi, J., Carvalho, W., Lee, H.: Successor feature landmarks for long-horizon goal-conditioned reinforcement learning. Advances in Neural Information Processing Systems 34 (2021) [14] Hartikainen, K., Geng, X., Haarnoja, T., Levine, S.: Dynamical distance learning for semi-supervised and unsupervised skill discovery. In: International Conference on Learning Representations (2019) [15] Sharma, A., Gu, S., Levine, S., Kumar, V., Hausman, K.: Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657 (2019) [16] Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Silver, D., Singh, S., Precup, D., Sutton, R.S.: Reward is enough. Artificial Intelligence 299, 103535 (2021) [5] Florensa, C., Held, D., Geng, X., Abbeel, P.: Automatic goal generation for reinforcement learning agents. In: International Conference on Machine Learning, pp. 1515–1528 (2018). PMLR [6] Pong, V.H., Dalal, M., Lin, S., Nair, A., Bahl, S., Levine, S.: Skew-fit: State-covering self-supervised reinforcement learning. In: International Conference on Machine Learning (2020). PMLR [7] Pitis, S., Chan, H., Zhao, S., Stadie, B., Ba, J.: Maximum entropy gain exploration for long horizon multi-goal reinforcement learning. In: International Conference on Machine Learning, pp. 7750–7761 (2020). PMLR [8] Mendonca, R., Rybkin, O., Daniilidis, K., Hafner, D., Pathak, D.: Discovering and achieving goals with world models. In: ICML 2021 Workshop on Unsupervised Reinforcement Learning (2021) [9] Liu, M., Zhu, M.Z., Zhang, W.: Goal-conditioned reinforcement learning: Problems and solutions. In: International Joint Conference on Artificial Intelligence (IJCAI-22), pp. 5502–5511 (2022) [10] Florensa, C., Duan, Y., Abbeel, P.: Stochastic neural networks for hierarchical reinforcement learning. arXiv preprint arXiv:1704.03012 (2017) [11] Eysenbach, B., Gupta, A., Ibarz, J., Levine, S.: Diversity is all you need: Learning skills without a reward function. In: International Conference on Learning Representations (2019) [12] Campero, A., Raileanu, R., Küttler, H., Tenenbaum, J.B., Rocktäschel, T., Grefenstette, E.: Learning with AMIGo: Adversarially motivated intrinsic goals. arXiv preprint arXiv:2006.12122 (2020) [13] Hoang, C., Sohn, S., Choi, J., Carvalho, W., Lee, H.: Successor feature landmarks for long-horizon goal-conditioned reinforcement learning. Advances in Neural Information Processing Systems 34 (2021) [14] Hartikainen, K., Geng, X., Haarnoja, T., Levine, S.: Dynamical distance learning for semi-supervised and unsupervised skill discovery. In: International Conference on Learning Representations (2019) [15] Sharma, A., Gu, S., Levine, S., Kumar, V., Hausman, K.: Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657 (2019) [16] Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Florensa, C., Held, D., Geng, X., Abbeel, P.: Automatic goal generation for reinforcement learning agents. In: International Conference on Machine Learning, pp. 1515–1528 (2018). PMLR [6] Pong, V.H., Dalal, M., Lin, S., Nair, A., Bahl, S., Levine, S.: Skew-fit: State-covering self-supervised reinforcement learning. In: International Conference on Machine Learning (2020). PMLR [7] Pitis, S., Chan, H., Zhao, S., Stadie, B., Ba, J.: Maximum entropy gain exploration for long horizon multi-goal reinforcement learning. In: International Conference on Machine Learning, pp. 7750–7761 (2020). PMLR [8] Mendonca, R., Rybkin, O., Daniilidis, K., Hafner, D., Pathak, D.: Discovering and achieving goals with world models. In: ICML 2021 Workshop on Unsupervised Reinforcement Learning (2021) [9] Liu, M., Zhu, M.Z., Zhang, W.: Goal-conditioned reinforcement learning: Problems and solutions. In: International Joint Conference on Artificial Intelligence (IJCAI-22), pp. 5502–5511 (2022) [10] Florensa, C., Duan, Y., Abbeel, P.: Stochastic neural networks for hierarchical reinforcement learning. arXiv preprint arXiv:1704.03012 (2017) [11] Eysenbach, B., Gupta, A., Ibarz, J., Levine, S.: Diversity is all you need: Learning skills without a reward function. In: International Conference on Learning Representations (2019) [12] Campero, A., Raileanu, R., Küttler, H., Tenenbaum, J.B., Rocktäschel, T., Grefenstette, E.: Learning with AMIGo: Adversarially motivated intrinsic goals. arXiv preprint arXiv:2006.12122 (2020) [13] Hoang, C., Sohn, S., Choi, J., Carvalho, W., Lee, H.: Successor feature landmarks for long-horizon goal-conditioned reinforcement learning. Advances in Neural Information Processing Systems 34 (2021) [14] Hartikainen, K., Geng, X., Haarnoja, T., Levine, S.: Dynamical distance learning for semi-supervised and unsupervised skill discovery. In: International Conference on Learning Representations (2019) [15] Sharma, A., Gu, S., Levine, S., Kumar, V., Hausman, K.: Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657 (2019) [16] Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Pong, V.H., Dalal, M., Lin, S., Nair, A., Bahl, S., Levine, S.: Skew-fit: State-covering self-supervised reinforcement learning. In: International Conference on Machine Learning (2020). PMLR [7] Pitis, S., Chan, H., Zhao, S., Stadie, B., Ba, J.: Maximum entropy gain exploration for long horizon multi-goal reinforcement learning. In: International Conference on Machine Learning, pp. 7750–7761 (2020). PMLR [8] Mendonca, R., Rybkin, O., Daniilidis, K., Hafner, D., Pathak, D.: Discovering and achieving goals with world models. In: ICML 2021 Workshop on Unsupervised Reinforcement Learning (2021) [9] Liu, M., Zhu, M.Z., Zhang, W.: Goal-conditioned reinforcement learning: Problems and solutions. In: International Joint Conference on Artificial Intelligence (IJCAI-22), pp. 5502–5511 (2022) [10] Florensa, C., Duan, Y., Abbeel, P.: Stochastic neural networks for hierarchical reinforcement learning. arXiv preprint arXiv:1704.03012 (2017) [11] Eysenbach, B., Gupta, A., Ibarz, J., Levine, S.: Diversity is all you need: Learning skills without a reward function. In: International Conference on Learning Representations (2019) [12] Campero, A., Raileanu, R., Küttler, H., Tenenbaum, J.B., Rocktäschel, T., Grefenstette, E.: Learning with AMIGo: Adversarially motivated intrinsic goals. arXiv preprint arXiv:2006.12122 (2020) [13] Hoang, C., Sohn, S., Choi, J., Carvalho, W., Lee, H.: Successor feature landmarks for long-horizon goal-conditioned reinforcement learning. Advances in Neural Information Processing Systems 34 (2021) [14] Hartikainen, K., Geng, X., Haarnoja, T., Levine, S.: Dynamical distance learning for semi-supervised and unsupervised skill discovery. In: International Conference on Learning Representations (2019) [15] Sharma, A., Gu, S., Levine, S., Kumar, V., Hausman, K.: Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657 (2019) [16] Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Pitis, S., Chan, H., Zhao, S., Stadie, B., Ba, J.: Maximum entropy gain exploration for long horizon multi-goal reinforcement learning. In: International Conference on Machine Learning, pp. 7750–7761 (2020). PMLR [8] Mendonca, R., Rybkin, O., Daniilidis, K., Hafner, D., Pathak, D.: Discovering and achieving goals with world models. In: ICML 2021 Workshop on Unsupervised Reinforcement Learning (2021) [9] Liu, M., Zhu, M.Z., Zhang, W.: Goal-conditioned reinforcement learning: Problems and solutions. In: International Joint Conference on Artificial Intelligence (IJCAI-22), pp. 5502–5511 (2022) [10] Florensa, C., Duan, Y., Abbeel, P.: Stochastic neural networks for hierarchical reinforcement learning. arXiv preprint arXiv:1704.03012 (2017) [11] Eysenbach, B., Gupta, A., Ibarz, J., Levine, S.: Diversity is all you need: Learning skills without a reward function. In: International Conference on Learning Representations (2019) [12] Campero, A., Raileanu, R., Küttler, H., Tenenbaum, J.B., Rocktäschel, T., Grefenstette, E.: Learning with AMIGo: Adversarially motivated intrinsic goals. arXiv preprint arXiv:2006.12122 (2020) [13] Hoang, C., Sohn, S., Choi, J., Carvalho, W., Lee, H.: Successor feature landmarks for long-horizon goal-conditioned reinforcement learning. Advances in Neural Information Processing Systems 34 (2021) [14] Hartikainen, K., Geng, X., Haarnoja, T., Levine, S.: Dynamical distance learning for semi-supervised and unsupervised skill discovery. In: International Conference on Learning Representations (2019) [15] Sharma, A., Gu, S., Levine, S., Kumar, V., Hausman, K.: Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657 (2019) [16] Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Mendonca, R., Rybkin, O., Daniilidis, K., Hafner, D., Pathak, D.: Discovering and achieving goals with world models. In: ICML 2021 Workshop on Unsupervised Reinforcement Learning (2021) [9] Liu, M., Zhu, M.Z., Zhang, W.: Goal-conditioned reinforcement learning: Problems and solutions. In: International Joint Conference on Artificial Intelligence (IJCAI-22), pp. 5502–5511 (2022) [10] Florensa, C., Duan, Y., Abbeel, P.: Stochastic neural networks for hierarchical reinforcement learning. arXiv preprint arXiv:1704.03012 (2017) [11] Eysenbach, B., Gupta, A., Ibarz, J., Levine, S.: Diversity is all you need: Learning skills without a reward function. In: International Conference on Learning Representations (2019) [12] Campero, A., Raileanu, R., Küttler, H., Tenenbaum, J.B., Rocktäschel, T., Grefenstette, E.: Learning with AMIGo: Adversarially motivated intrinsic goals. arXiv preprint arXiv:2006.12122 (2020) [13] Hoang, C., Sohn, S., Choi, J., Carvalho, W., Lee, H.: Successor feature landmarks for long-horizon goal-conditioned reinforcement learning. Advances in Neural Information Processing Systems 34 (2021) [14] Hartikainen, K., Geng, X., Haarnoja, T., Levine, S.: Dynamical distance learning for semi-supervised and unsupervised skill discovery. In: International Conference on Learning Representations (2019) [15] Sharma, A., Gu, S., Levine, S., Kumar, V., Hausman, K.: Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657 (2019) [16] Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Liu, M., Zhu, M.Z., Zhang, W.: Goal-conditioned reinforcement learning: Problems and solutions. In: International Joint Conference on Artificial Intelligence (IJCAI-22), pp. 5502–5511 (2022) [10] Florensa, C., Duan, Y., Abbeel, P.: Stochastic neural networks for hierarchical reinforcement learning. arXiv preprint arXiv:1704.03012 (2017) [11] Eysenbach, B., Gupta, A., Ibarz, J., Levine, S.: Diversity is all you need: Learning skills without a reward function. In: International Conference on Learning Representations (2019) [12] Campero, A., Raileanu, R., Küttler, H., Tenenbaum, J.B., Rocktäschel, T., Grefenstette, E.: Learning with AMIGo: Adversarially motivated intrinsic goals. arXiv preprint arXiv:2006.12122 (2020) [13] Hoang, C., Sohn, S., Choi, J., Carvalho, W., Lee, H.: Successor feature landmarks for long-horizon goal-conditioned reinforcement learning. Advances in Neural Information Processing Systems 34 (2021) [14] Hartikainen, K., Geng, X., Haarnoja, T., Levine, S.: Dynamical distance learning for semi-supervised and unsupervised skill discovery. In: International Conference on Learning Representations (2019) [15] Sharma, A., Gu, S., Levine, S., Kumar, V., Hausman, K.: Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657 (2019) [16] Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Florensa, C., Duan, Y., Abbeel, P.: Stochastic neural networks for hierarchical reinforcement learning. arXiv preprint arXiv:1704.03012 (2017) [11] Eysenbach, B., Gupta, A., Ibarz, J., Levine, S.: Diversity is all you need: Learning skills without a reward function. In: International Conference on Learning Representations (2019) [12] Campero, A., Raileanu, R., Küttler, H., Tenenbaum, J.B., Rocktäschel, T., Grefenstette, E.: Learning with AMIGo: Adversarially motivated intrinsic goals. arXiv preprint arXiv:2006.12122 (2020) [13] Hoang, C., Sohn, S., Choi, J., Carvalho, W., Lee, H.: Successor feature landmarks for long-horizon goal-conditioned reinforcement learning. Advances in Neural Information Processing Systems 34 (2021) [14] Hartikainen, K., Geng, X., Haarnoja, T., Levine, S.: Dynamical distance learning for semi-supervised and unsupervised skill discovery. In: International Conference on Learning Representations (2019) [15] Sharma, A., Gu, S., Levine, S., Kumar, V., Hausman, K.: Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657 (2019) [16] Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Eysenbach, B., Gupta, A., Ibarz, J., Levine, S.: Diversity is all you need: Learning skills without a reward function. In: International Conference on Learning Representations (2019) [12] Campero, A., Raileanu, R., Küttler, H., Tenenbaum, J.B., Rocktäschel, T., Grefenstette, E.: Learning with AMIGo: Adversarially motivated intrinsic goals. arXiv preprint arXiv:2006.12122 (2020) [13] Hoang, C., Sohn, S., Choi, J., Carvalho, W., Lee, H.: Successor feature landmarks for long-horizon goal-conditioned reinforcement learning. Advances in Neural Information Processing Systems 34 (2021) [14] Hartikainen, K., Geng, X., Haarnoja, T., Levine, S.: Dynamical distance learning for semi-supervised and unsupervised skill discovery. In: International Conference on Learning Representations (2019) [15] Sharma, A., Gu, S., Levine, S., Kumar, V., Hausman, K.: Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657 (2019) [16] Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Campero, A., Raileanu, R., Küttler, H., Tenenbaum, J.B., Rocktäschel, T., Grefenstette, E.: Learning with AMIGo: Adversarially motivated intrinsic goals. arXiv preprint arXiv:2006.12122 (2020) [13] Hoang, C., Sohn, S., Choi, J., Carvalho, W., Lee, H.: Successor feature landmarks for long-horizon goal-conditioned reinforcement learning. Advances in Neural Information Processing Systems 34 (2021) [14] Hartikainen, K., Geng, X., Haarnoja, T., Levine, S.: Dynamical distance learning for semi-supervised and unsupervised skill discovery. In: International Conference on Learning Representations (2019) [15] Sharma, A., Gu, S., Levine, S., Kumar, V., Hausman, K.: Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657 (2019) [16] Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Hoang, C., Sohn, S., Choi, J., Carvalho, W., Lee, H.: Successor feature landmarks for long-horizon goal-conditioned reinforcement learning. Advances in Neural Information Processing Systems 34 (2021) [14] Hartikainen, K., Geng, X., Haarnoja, T., Levine, S.: Dynamical distance learning for semi-supervised and unsupervised skill discovery. In: International Conference on Learning Representations (2019) [15] Sharma, A., Gu, S., Levine, S., Kumar, V., Hausman, K.: Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657 (2019) [16] Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Hartikainen, K., Geng, X., Haarnoja, T., Levine, S.: Dynamical distance learning for semi-supervised and unsupervised skill discovery. In: International Conference on Learning Representations (2019) [15] Sharma, A., Gu, S., Levine, S., Kumar, V., Hausman, K.: Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657 (2019) [16] Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Sharma, A., Gu, S., Levine, S., Kumar, V., Hausman, K.: Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657 (2019) [16] Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019)
  2. Levine, S., Finn, C., Darrell, T., Abbeel, P.: End-to-end training of deep visuomotor policies. Journal of Machine Learning Research 17(1), 1334–1373 (2016) [4] Silver, D., Singh, S., Precup, D., Sutton, R.S.: Reward is enough. Artificial Intelligence 299, 103535 (2021) [5] Florensa, C., Held, D., Geng, X., Abbeel, P.: Automatic goal generation for reinforcement learning agents. In: International Conference on Machine Learning, pp. 1515–1528 (2018). PMLR [6] Pong, V.H., Dalal, M., Lin, S., Nair, A., Bahl, S., Levine, S.: Skew-fit: State-covering self-supervised reinforcement learning. In: International Conference on Machine Learning (2020). PMLR [7] Pitis, S., Chan, H., Zhao, S., Stadie, B., Ba, J.: Maximum entropy gain exploration for long horizon multi-goal reinforcement learning. In: International Conference on Machine Learning, pp. 7750–7761 (2020). PMLR [8] Mendonca, R., Rybkin, O., Daniilidis, K., Hafner, D., Pathak, D.: Discovering and achieving goals with world models. In: ICML 2021 Workshop on Unsupervised Reinforcement Learning (2021) [9] Liu, M., Zhu, M.Z., Zhang, W.: Goal-conditioned reinforcement learning: Problems and solutions. In: International Joint Conference on Artificial Intelligence (IJCAI-22), pp. 5502–5511 (2022) [10] Florensa, C., Duan, Y., Abbeel, P.: Stochastic neural networks for hierarchical reinforcement learning. arXiv preprint arXiv:1704.03012 (2017) [11] Eysenbach, B., Gupta, A., Ibarz, J., Levine, S.: Diversity is all you need: Learning skills without a reward function. In: International Conference on Learning Representations (2019) [12] Campero, A., Raileanu, R., Küttler, H., Tenenbaum, J.B., Rocktäschel, T., Grefenstette, E.: Learning with AMIGo: Adversarially motivated intrinsic goals. arXiv preprint arXiv:2006.12122 (2020) [13] Hoang, C., Sohn, S., Choi, J., Carvalho, W., Lee, H.: Successor feature landmarks for long-horizon goal-conditioned reinforcement learning. Advances in Neural Information Processing Systems 34 (2021) [14] Hartikainen, K., Geng, X., Haarnoja, T., Levine, S.: Dynamical distance learning for semi-supervised and unsupervised skill discovery. In: International Conference on Learning Representations (2019) [15] Sharma, A., Gu, S., Levine, S., Kumar, V., Hausman, K.: Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657 (2019) [16] Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Silver, D., Singh, S., Precup, D., Sutton, R.S.: Reward is enough. Artificial Intelligence 299, 103535 (2021) [5] Florensa, C., Held, D., Geng, X., Abbeel, P.: Automatic goal generation for reinforcement learning agents. In: International Conference on Machine Learning, pp. 1515–1528 (2018). PMLR [6] Pong, V.H., Dalal, M., Lin, S., Nair, A., Bahl, S., Levine, S.: Skew-fit: State-covering self-supervised reinforcement learning. In: International Conference on Machine Learning (2020). PMLR [7] Pitis, S., Chan, H., Zhao, S., Stadie, B., Ba, J.: Maximum entropy gain exploration for long horizon multi-goal reinforcement learning. In: International Conference on Machine Learning, pp. 7750–7761 (2020). PMLR [8] Mendonca, R., Rybkin, O., Daniilidis, K., Hafner, D., Pathak, D.: Discovering and achieving goals with world models. In: ICML 2021 Workshop on Unsupervised Reinforcement Learning (2021) [9] Liu, M., Zhu, M.Z., Zhang, W.: Goal-conditioned reinforcement learning: Problems and solutions. In: International Joint Conference on Artificial Intelligence (IJCAI-22), pp. 5502–5511 (2022) [10] Florensa, C., Duan, Y., Abbeel, P.: Stochastic neural networks for hierarchical reinforcement learning. arXiv preprint arXiv:1704.03012 (2017) [11] Eysenbach, B., Gupta, A., Ibarz, J., Levine, S.: Diversity is all you need: Learning skills without a reward function. In: International Conference on Learning Representations (2019) [12] Campero, A., Raileanu, R., Küttler, H., Tenenbaum, J.B., Rocktäschel, T., Grefenstette, E.: Learning with AMIGo: Adversarially motivated intrinsic goals. arXiv preprint arXiv:2006.12122 (2020) [13] Hoang, C., Sohn, S., Choi, J., Carvalho, W., Lee, H.: Successor feature landmarks for long-horizon goal-conditioned reinforcement learning. Advances in Neural Information Processing Systems 34 (2021) [14] Hartikainen, K., Geng, X., Haarnoja, T., Levine, S.: Dynamical distance learning for semi-supervised and unsupervised skill discovery. In: International Conference on Learning Representations (2019) [15] Sharma, A., Gu, S., Levine, S., Kumar, V., Hausman, K.: Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657 (2019) [16] Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Florensa, C., Held, D., Geng, X., Abbeel, P.: Automatic goal generation for reinforcement learning agents. In: International Conference on Machine Learning, pp. 1515–1528 (2018). PMLR [6] Pong, V.H., Dalal, M., Lin, S., Nair, A., Bahl, S., Levine, S.: Skew-fit: State-covering self-supervised reinforcement learning. In: International Conference on Machine Learning (2020). PMLR [7] Pitis, S., Chan, H., Zhao, S., Stadie, B., Ba, J.: Maximum entropy gain exploration for long horizon multi-goal reinforcement learning. In: International Conference on Machine Learning, pp. 7750–7761 (2020). PMLR [8] Mendonca, R., Rybkin, O., Daniilidis, K., Hafner, D., Pathak, D.: Discovering and achieving goals with world models. In: ICML 2021 Workshop on Unsupervised Reinforcement Learning (2021) [9] Liu, M., Zhu, M.Z., Zhang, W.: Goal-conditioned reinforcement learning: Problems and solutions. In: International Joint Conference on Artificial Intelligence (IJCAI-22), pp. 5502–5511 (2022) [10] Florensa, C., Duan, Y., Abbeel, P.: Stochastic neural networks for hierarchical reinforcement learning. arXiv preprint arXiv:1704.03012 (2017) [11] Eysenbach, B., Gupta, A., Ibarz, J., Levine, S.: Diversity is all you need: Learning skills without a reward function. In: International Conference on Learning Representations (2019) [12] Campero, A., Raileanu, R., Küttler, H., Tenenbaum, J.B., Rocktäschel, T., Grefenstette, E.: Learning with AMIGo: Adversarially motivated intrinsic goals. arXiv preprint arXiv:2006.12122 (2020) [13] Hoang, C., Sohn, S., Choi, J., Carvalho, W., Lee, H.: Successor feature landmarks for long-horizon goal-conditioned reinforcement learning. Advances in Neural Information Processing Systems 34 (2021) [14] Hartikainen, K., Geng, X., Haarnoja, T., Levine, S.: Dynamical distance learning for semi-supervised and unsupervised skill discovery. In: International Conference on Learning Representations (2019) [15] Sharma, A., Gu, S., Levine, S., Kumar, V., Hausman, K.: Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657 (2019) [16] Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Pong, V.H., Dalal, M., Lin, S., Nair, A., Bahl, S., Levine, S.: Skew-fit: State-covering self-supervised reinforcement learning. In: International Conference on Machine Learning (2020). PMLR [7] Pitis, S., Chan, H., Zhao, S., Stadie, B., Ba, J.: Maximum entropy gain exploration for long horizon multi-goal reinforcement learning. In: International Conference on Machine Learning, pp. 7750–7761 (2020). PMLR [8] Mendonca, R., Rybkin, O., Daniilidis, K., Hafner, D., Pathak, D.: Discovering and achieving goals with world models. In: ICML 2021 Workshop on Unsupervised Reinforcement Learning (2021) [9] Liu, M., Zhu, M.Z., Zhang, W.: Goal-conditioned reinforcement learning: Problems and solutions. In: International Joint Conference on Artificial Intelligence (IJCAI-22), pp. 5502–5511 (2022) [10] Florensa, C., Duan, Y., Abbeel, P.: Stochastic neural networks for hierarchical reinforcement learning. arXiv preprint arXiv:1704.03012 (2017) [11] Eysenbach, B., Gupta, A., Ibarz, J., Levine, S.: Diversity is all you need: Learning skills without a reward function. In: International Conference on Learning Representations (2019) [12] Campero, A., Raileanu, R., Küttler, H., Tenenbaum, J.B., Rocktäschel, T., Grefenstette, E.: Learning with AMIGo: Adversarially motivated intrinsic goals. arXiv preprint arXiv:2006.12122 (2020) [13] Hoang, C., Sohn, S., Choi, J., Carvalho, W., Lee, H.: Successor feature landmarks for long-horizon goal-conditioned reinforcement learning. Advances in Neural Information Processing Systems 34 (2021) [14] Hartikainen, K., Geng, X., Haarnoja, T., Levine, S.: Dynamical distance learning for semi-supervised and unsupervised skill discovery. In: International Conference on Learning Representations (2019) [15] Sharma, A., Gu, S., Levine, S., Kumar, V., Hausman, K.: Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657 (2019) [16] Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Pitis, S., Chan, H., Zhao, S., Stadie, B., Ba, J.: Maximum entropy gain exploration for long horizon multi-goal reinforcement learning. In: International Conference on Machine Learning, pp. 7750–7761 (2020). PMLR [8] Mendonca, R., Rybkin, O., Daniilidis, K., Hafner, D., Pathak, D.: Discovering and achieving goals with world models. In: ICML 2021 Workshop on Unsupervised Reinforcement Learning (2021) [9] Liu, M., Zhu, M.Z., Zhang, W.: Goal-conditioned reinforcement learning: Problems and solutions. In: International Joint Conference on Artificial Intelligence (IJCAI-22), pp. 5502–5511 (2022) [10] Florensa, C., Duan, Y., Abbeel, P.: Stochastic neural networks for hierarchical reinforcement learning. arXiv preprint arXiv:1704.03012 (2017) [11] Eysenbach, B., Gupta, A., Ibarz, J., Levine, S.: Diversity is all you need: Learning skills without a reward function. In: International Conference on Learning Representations (2019) [12] Campero, A., Raileanu, R., Küttler, H., Tenenbaum, J.B., Rocktäschel, T., Grefenstette, E.: Learning with AMIGo: Adversarially motivated intrinsic goals. arXiv preprint arXiv:2006.12122 (2020) [13] Hoang, C., Sohn, S., Choi, J., Carvalho, W., Lee, H.: Successor feature landmarks for long-horizon goal-conditioned reinforcement learning. Advances in Neural Information Processing Systems 34 (2021) [14] Hartikainen, K., Geng, X., Haarnoja, T., Levine, S.: Dynamical distance learning for semi-supervised and unsupervised skill discovery. In: International Conference on Learning Representations (2019) [15] Sharma, A., Gu, S., Levine, S., Kumar, V., Hausman, K.: Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657 (2019) [16] Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Mendonca, R., Rybkin, O., Daniilidis, K., Hafner, D., Pathak, D.: Discovering and achieving goals with world models. In: ICML 2021 Workshop on Unsupervised Reinforcement Learning (2021) [9] Liu, M., Zhu, M.Z., Zhang, W.: Goal-conditioned reinforcement learning: Problems and solutions. In: International Joint Conference on Artificial Intelligence (IJCAI-22), pp. 5502–5511 (2022) [10] Florensa, C., Duan, Y., Abbeel, P.: Stochastic neural networks for hierarchical reinforcement learning. arXiv preprint arXiv:1704.03012 (2017) [11] Eysenbach, B., Gupta, A., Ibarz, J., Levine, S.: Diversity is all you need: Learning skills without a reward function. In: International Conference on Learning Representations (2019) [12] Campero, A., Raileanu, R., Küttler, H., Tenenbaum, J.B., Rocktäschel, T., Grefenstette, E.: Learning with AMIGo: Adversarially motivated intrinsic goals. arXiv preprint arXiv:2006.12122 (2020) [13] Hoang, C., Sohn, S., Choi, J., Carvalho, W., Lee, H.: Successor feature landmarks for long-horizon goal-conditioned reinforcement learning. Advances in Neural Information Processing Systems 34 (2021) [14] Hartikainen, K., Geng, X., Haarnoja, T., Levine, S.: Dynamical distance learning for semi-supervised and unsupervised skill discovery. In: International Conference on Learning Representations (2019) [15] Sharma, A., Gu, S., Levine, S., Kumar, V., Hausman, K.: Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657 (2019) [16] Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Liu, M., Zhu, M.Z., Zhang, W.: Goal-conditioned reinforcement learning: Problems and solutions. In: International Joint Conference on Artificial Intelligence (IJCAI-22), pp. 5502–5511 (2022) [10] Florensa, C., Duan, Y., Abbeel, P.: Stochastic neural networks for hierarchical reinforcement learning. arXiv preprint arXiv:1704.03012 (2017) [11] Eysenbach, B., Gupta, A., Ibarz, J., Levine, S.: Diversity is all you need: Learning skills without a reward function. In: International Conference on Learning Representations (2019) [12] Campero, A., Raileanu, R., Küttler, H., Tenenbaum, J.B., Rocktäschel, T., Grefenstette, E.: Learning with AMIGo: Adversarially motivated intrinsic goals. arXiv preprint arXiv:2006.12122 (2020) [13] Hoang, C., Sohn, S., Choi, J., Carvalho, W., Lee, H.: Successor feature landmarks for long-horizon goal-conditioned reinforcement learning. Advances in Neural Information Processing Systems 34 (2021) [14] Hartikainen, K., Geng, X., Haarnoja, T., Levine, S.: Dynamical distance learning for semi-supervised and unsupervised skill discovery. In: International Conference on Learning Representations (2019) [15] Sharma, A., Gu, S., Levine, S., Kumar, V., Hausman, K.: Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657 (2019) [16] Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Florensa, C., Duan, Y., Abbeel, P.: Stochastic neural networks for hierarchical reinforcement learning. arXiv preprint arXiv:1704.03012 (2017) [11] Eysenbach, B., Gupta, A., Ibarz, J., Levine, S.: Diversity is all you need: Learning skills without a reward function. In: International Conference on Learning Representations (2019) [12] Campero, A., Raileanu, R., Küttler, H., Tenenbaum, J.B., Rocktäschel, T., Grefenstette, E.: Learning with AMIGo: Adversarially motivated intrinsic goals. arXiv preprint arXiv:2006.12122 (2020) [13] Hoang, C., Sohn, S., Choi, J., Carvalho, W., Lee, H.: Successor feature landmarks for long-horizon goal-conditioned reinforcement learning. Advances in Neural Information Processing Systems 34 (2021) [14] Hartikainen, K., Geng, X., Haarnoja, T., Levine, S.: Dynamical distance learning for semi-supervised and unsupervised skill discovery. In: International Conference on Learning Representations (2019) [15] Sharma, A., Gu, S., Levine, S., Kumar, V., Hausman, K.: Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657 (2019) [16] Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Eysenbach, B., Gupta, A., Ibarz, J., Levine, S.: Diversity is all you need: Learning skills without a reward function. In: International Conference on Learning Representations (2019) [12] Campero, A., Raileanu, R., Küttler, H., Tenenbaum, J.B., Rocktäschel, T., Grefenstette, E.: Learning with AMIGo: Adversarially motivated intrinsic goals. arXiv preprint arXiv:2006.12122 (2020) [13] Hoang, C., Sohn, S., Choi, J., Carvalho, W., Lee, H.: Successor feature landmarks for long-horizon goal-conditioned reinforcement learning. Advances in Neural Information Processing Systems 34 (2021) [14] Hartikainen, K., Geng, X., Haarnoja, T., Levine, S.: Dynamical distance learning for semi-supervised and unsupervised skill discovery. In: International Conference on Learning Representations (2019) [15] Sharma, A., Gu, S., Levine, S., Kumar, V., Hausman, K.: Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657 (2019) [16] Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Campero, A., Raileanu, R., Küttler, H., Tenenbaum, J.B., Rocktäschel, T., Grefenstette, E.: Learning with AMIGo: Adversarially motivated intrinsic goals. arXiv preprint arXiv:2006.12122 (2020) [13] Hoang, C., Sohn, S., Choi, J., Carvalho, W., Lee, H.: Successor feature landmarks for long-horizon goal-conditioned reinforcement learning. Advances in Neural Information Processing Systems 34 (2021) [14] Hartikainen, K., Geng, X., Haarnoja, T., Levine, S.: Dynamical distance learning for semi-supervised and unsupervised skill discovery. In: International Conference on Learning Representations (2019) [15] Sharma, A., Gu, S., Levine, S., Kumar, V., Hausman, K.: Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657 (2019) [16] Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Hoang, C., Sohn, S., Choi, J., Carvalho, W., Lee, H.: Successor feature landmarks for long-horizon goal-conditioned reinforcement learning. Advances in Neural Information Processing Systems 34 (2021) [14] Hartikainen, K., Geng, X., Haarnoja, T., Levine, S.: Dynamical distance learning for semi-supervised and unsupervised skill discovery. In: International Conference on Learning Representations (2019) [15] Sharma, A., Gu, S., Levine, S., Kumar, V., Hausman, K.: Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657 (2019) [16] Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Hartikainen, K., Geng, X., Haarnoja, T., Levine, S.: Dynamical distance learning for semi-supervised and unsupervised skill discovery. In: International Conference on Learning Representations (2019) [15] Sharma, A., Gu, S., Levine, S., Kumar, V., Hausman, K.: Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657 (2019) [16] Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Sharma, A., Gu, S., Levine, S., Kumar, V., Hausman, K.: Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657 (2019) [16] Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019)
  3. Silver, D., Singh, S., Precup, D., Sutton, R.S.: Reward is enough. Artificial Intelligence 299, 103535 (2021) [5] Florensa, C., Held, D., Geng, X., Abbeel, P.: Automatic goal generation for reinforcement learning agents. In: International Conference on Machine Learning, pp. 1515–1528 (2018). PMLR [6] Pong, V.H., Dalal, M., Lin, S., Nair, A., Bahl, S., Levine, S.: Skew-fit: State-covering self-supervised reinforcement learning. In: International Conference on Machine Learning (2020). PMLR [7] Pitis, S., Chan, H., Zhao, S., Stadie, B., Ba, J.: Maximum entropy gain exploration for long horizon multi-goal reinforcement learning. In: International Conference on Machine Learning, pp. 7750–7761 (2020). PMLR [8] Mendonca, R., Rybkin, O., Daniilidis, K., Hafner, D., Pathak, D.: Discovering and achieving goals with world models. In: ICML 2021 Workshop on Unsupervised Reinforcement Learning (2021) [9] Liu, M., Zhu, M.Z., Zhang, W.: Goal-conditioned reinforcement learning: Problems and solutions. In: International Joint Conference on Artificial Intelligence (IJCAI-22), pp. 5502–5511 (2022) [10] Florensa, C., Duan, Y., Abbeel, P.: Stochastic neural networks for hierarchical reinforcement learning. arXiv preprint arXiv:1704.03012 (2017) [11] Eysenbach, B., Gupta, A., Ibarz, J., Levine, S.: Diversity is all you need: Learning skills without a reward function. In: International Conference on Learning Representations (2019) [12] Campero, A., Raileanu, R., Küttler, H., Tenenbaum, J.B., Rocktäschel, T., Grefenstette, E.: Learning with AMIGo: Adversarially motivated intrinsic goals. arXiv preprint arXiv:2006.12122 (2020) [13] Hoang, C., Sohn, S., Choi, J., Carvalho, W., Lee, H.: Successor feature landmarks for long-horizon goal-conditioned reinforcement learning. Advances in Neural Information Processing Systems 34 (2021) [14] Hartikainen, K., Geng, X., Haarnoja, T., Levine, S.: Dynamical distance learning for semi-supervised and unsupervised skill discovery. In: International Conference on Learning Representations (2019) [15] Sharma, A., Gu, S., Levine, S., Kumar, V., Hausman, K.: Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657 (2019) [16] Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Florensa, C., Held, D., Geng, X., Abbeel, P.: Automatic goal generation for reinforcement learning agents. In: International Conference on Machine Learning, pp. 1515–1528 (2018). PMLR [6] Pong, V.H., Dalal, M., Lin, S., Nair, A., Bahl, S., Levine, S.: Skew-fit: State-covering self-supervised reinforcement learning. In: International Conference on Machine Learning (2020). PMLR [7] Pitis, S., Chan, H., Zhao, S., Stadie, B., Ba, J.: Maximum entropy gain exploration for long horizon multi-goal reinforcement learning. In: International Conference on Machine Learning, pp. 7750–7761 (2020). PMLR [8] Mendonca, R., Rybkin, O., Daniilidis, K., Hafner, D., Pathak, D.: Discovering and achieving goals with world models. In: ICML 2021 Workshop on Unsupervised Reinforcement Learning (2021) [9] Liu, M., Zhu, M.Z., Zhang, W.: Goal-conditioned reinforcement learning: Problems and solutions. In: International Joint Conference on Artificial Intelligence (IJCAI-22), pp. 5502–5511 (2022) [10] Florensa, C., Duan, Y., Abbeel, P.: Stochastic neural networks for hierarchical reinforcement learning. arXiv preprint arXiv:1704.03012 (2017) [11] Eysenbach, B., Gupta, A., Ibarz, J., Levine, S.: Diversity is all you need: Learning skills without a reward function. In: International Conference on Learning Representations (2019) [12] Campero, A., Raileanu, R., Küttler, H., Tenenbaum, J.B., Rocktäschel, T., Grefenstette, E.: Learning with AMIGo: Adversarially motivated intrinsic goals. arXiv preprint arXiv:2006.12122 (2020) [13] Hoang, C., Sohn, S., Choi, J., Carvalho, W., Lee, H.: Successor feature landmarks for long-horizon goal-conditioned reinforcement learning. Advances in Neural Information Processing Systems 34 (2021) [14] Hartikainen, K., Geng, X., Haarnoja, T., Levine, S.: Dynamical distance learning for semi-supervised and unsupervised skill discovery. In: International Conference on Learning Representations (2019) [15] Sharma, A., Gu, S., Levine, S., Kumar, V., Hausman, K.: Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657 (2019) [16] Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Pong, V.H., Dalal, M., Lin, S., Nair, A., Bahl, S., Levine, S.: Skew-fit: State-covering self-supervised reinforcement learning. In: International Conference on Machine Learning (2020). PMLR [7] Pitis, S., Chan, H., Zhao, S., Stadie, B., Ba, J.: Maximum entropy gain exploration for long horizon multi-goal reinforcement learning. In: International Conference on Machine Learning, pp. 7750–7761 (2020). PMLR [8] Mendonca, R., Rybkin, O., Daniilidis, K., Hafner, D., Pathak, D.: Discovering and achieving goals with world models. In: ICML 2021 Workshop on Unsupervised Reinforcement Learning (2021) [9] Liu, M., Zhu, M.Z., Zhang, W.: Goal-conditioned reinforcement learning: Problems and solutions. In: International Joint Conference on Artificial Intelligence (IJCAI-22), pp. 5502–5511 (2022) [10] Florensa, C., Duan, Y., Abbeel, P.: Stochastic neural networks for hierarchical reinforcement learning. arXiv preprint arXiv:1704.03012 (2017) [11] Eysenbach, B., Gupta, A., Ibarz, J., Levine, S.: Diversity is all you need: Learning skills without a reward function. In: International Conference on Learning Representations (2019) [12] Campero, A., Raileanu, R., Küttler, H., Tenenbaum, J.B., Rocktäschel, T., Grefenstette, E.: Learning with AMIGo: Adversarially motivated intrinsic goals. arXiv preprint arXiv:2006.12122 (2020) [13] Hoang, C., Sohn, S., Choi, J., Carvalho, W., Lee, H.: Successor feature landmarks for long-horizon goal-conditioned reinforcement learning. Advances in Neural Information Processing Systems 34 (2021) [14] Hartikainen, K., Geng, X., Haarnoja, T., Levine, S.: Dynamical distance learning for semi-supervised and unsupervised skill discovery. In: International Conference on Learning Representations (2019) [15] Sharma, A., Gu, S., Levine, S., Kumar, V., Hausman, K.: Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657 (2019) [16] Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Pitis, S., Chan, H., Zhao, S., Stadie, B., Ba, J.: Maximum entropy gain exploration for long horizon multi-goal reinforcement learning. In: International Conference on Machine Learning, pp. 7750–7761 (2020). PMLR [8] Mendonca, R., Rybkin, O., Daniilidis, K., Hafner, D., Pathak, D.: Discovering and achieving goals with world models. In: ICML 2021 Workshop on Unsupervised Reinforcement Learning (2021) [9] Liu, M., Zhu, M.Z., Zhang, W.: Goal-conditioned reinforcement learning: Problems and solutions. In: International Joint Conference on Artificial Intelligence (IJCAI-22), pp. 5502–5511 (2022) [10] Florensa, C., Duan, Y., Abbeel, P.: Stochastic neural networks for hierarchical reinforcement learning. arXiv preprint arXiv:1704.03012 (2017) [11] Eysenbach, B., Gupta, A., Ibarz, J., Levine, S.: Diversity is all you need: Learning skills without a reward function. In: International Conference on Learning Representations (2019) [12] Campero, A., Raileanu, R., Küttler, H., Tenenbaum, J.B., Rocktäschel, T., Grefenstette, E.: Learning with AMIGo: Adversarially motivated intrinsic goals. arXiv preprint arXiv:2006.12122 (2020) [13] Hoang, C., Sohn, S., Choi, J., Carvalho, W., Lee, H.: Successor feature landmarks for long-horizon goal-conditioned reinforcement learning. Advances in Neural Information Processing Systems 34 (2021) [14] Hartikainen, K., Geng, X., Haarnoja, T., Levine, S.: Dynamical distance learning for semi-supervised and unsupervised skill discovery. In: International Conference on Learning Representations (2019) [15] Sharma, A., Gu, S., Levine, S., Kumar, V., Hausman, K.: Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657 (2019) [16] Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Mendonca, R., Rybkin, O., Daniilidis, K., Hafner, D., Pathak, D.: Discovering and achieving goals with world models. In: ICML 2021 Workshop on Unsupervised Reinforcement Learning (2021) [9] Liu, M., Zhu, M.Z., Zhang, W.: Goal-conditioned reinforcement learning: Problems and solutions. In: International Joint Conference on Artificial Intelligence (IJCAI-22), pp. 5502–5511 (2022) [10] Florensa, C., Duan, Y., Abbeel, P.: Stochastic neural networks for hierarchical reinforcement learning. arXiv preprint arXiv:1704.03012 (2017) [11] Eysenbach, B., Gupta, A., Ibarz, J., Levine, S.: Diversity is all you need: Learning skills without a reward function. In: International Conference on Learning Representations (2019) [12] Campero, A., Raileanu, R., Küttler, H., Tenenbaum, J.B., Rocktäschel, T., Grefenstette, E.: Learning with AMIGo: Adversarially motivated intrinsic goals. arXiv preprint arXiv:2006.12122 (2020) [13] Hoang, C., Sohn, S., Choi, J., Carvalho, W., Lee, H.: Successor feature landmarks for long-horizon goal-conditioned reinforcement learning. Advances in Neural Information Processing Systems 34 (2021) [14] Hartikainen, K., Geng, X., Haarnoja, T., Levine, S.: Dynamical distance learning for semi-supervised and unsupervised skill discovery. In: International Conference on Learning Representations (2019) [15] Sharma, A., Gu, S., Levine, S., Kumar, V., Hausman, K.: Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657 (2019) [16] Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Liu, M., Zhu, M.Z., Zhang, W.: Goal-conditioned reinforcement learning: Problems and solutions. In: International Joint Conference on Artificial Intelligence (IJCAI-22), pp. 5502–5511 (2022) [10] Florensa, C., Duan, Y., Abbeel, P.: Stochastic neural networks for hierarchical reinforcement learning. arXiv preprint arXiv:1704.03012 (2017) [11] Eysenbach, B., Gupta, A., Ibarz, J., Levine, S.: Diversity is all you need: Learning skills without a reward function. In: International Conference on Learning Representations (2019) [12] Campero, A., Raileanu, R., Küttler, H., Tenenbaum, J.B., Rocktäschel, T., Grefenstette, E.: Learning with AMIGo: Adversarially motivated intrinsic goals. arXiv preprint arXiv:2006.12122 (2020) [13] Hoang, C., Sohn, S., Choi, J., Carvalho, W., Lee, H.: Successor feature landmarks for long-horizon goal-conditioned reinforcement learning. Advances in Neural Information Processing Systems 34 (2021) [14] Hartikainen, K., Geng, X., Haarnoja, T., Levine, S.: Dynamical distance learning for semi-supervised and unsupervised skill discovery. In: International Conference on Learning Representations (2019) [15] Sharma, A., Gu, S., Levine, S., Kumar, V., Hausman, K.: Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657 (2019) [16] Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Florensa, C., Duan, Y., Abbeel, P.: Stochastic neural networks for hierarchical reinforcement learning. arXiv preprint arXiv:1704.03012 (2017) [11] Eysenbach, B., Gupta, A., Ibarz, J., Levine, S.: Diversity is all you need: Learning skills without a reward function. In: International Conference on Learning Representations (2019) [12] Campero, A., Raileanu, R., Küttler, H., Tenenbaum, J.B., Rocktäschel, T., Grefenstette, E.: Learning with AMIGo: Adversarially motivated intrinsic goals. arXiv preprint arXiv:2006.12122 (2020) [13] Hoang, C., Sohn, S., Choi, J., Carvalho, W., Lee, H.: Successor feature landmarks for long-horizon goal-conditioned reinforcement learning. Advances in Neural Information Processing Systems 34 (2021) [14] Hartikainen, K., Geng, X., Haarnoja, T., Levine, S.: Dynamical distance learning for semi-supervised and unsupervised skill discovery. In: International Conference on Learning Representations (2019) [15] Sharma, A., Gu, S., Levine, S., Kumar, V., Hausman, K.: Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657 (2019) [16] Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Eysenbach, B., Gupta, A., Ibarz, J., Levine, S.: Diversity is all you need: Learning skills without a reward function. In: International Conference on Learning Representations (2019) [12] Campero, A., Raileanu, R., Küttler, H., Tenenbaum, J.B., Rocktäschel, T., Grefenstette, E.: Learning with AMIGo: Adversarially motivated intrinsic goals. arXiv preprint arXiv:2006.12122 (2020) [13] Hoang, C., Sohn, S., Choi, J., Carvalho, W., Lee, H.: Successor feature landmarks for long-horizon goal-conditioned reinforcement learning. Advances in Neural Information Processing Systems 34 (2021) [14] Hartikainen, K., Geng, X., Haarnoja, T., Levine, S.: Dynamical distance learning for semi-supervised and unsupervised skill discovery. In: International Conference on Learning Representations (2019) [15] Sharma, A., Gu, S., Levine, S., Kumar, V., Hausman, K.: Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657 (2019) [16] Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Campero, A., Raileanu, R., Küttler, H., Tenenbaum, J.B., Rocktäschel, T., Grefenstette, E.: Learning with AMIGo: Adversarially motivated intrinsic goals. arXiv preprint arXiv:2006.12122 (2020) [13] Hoang, C., Sohn, S., Choi, J., Carvalho, W., Lee, H.: Successor feature landmarks for long-horizon goal-conditioned reinforcement learning. Advances in Neural Information Processing Systems 34 (2021) [14] Hartikainen, K., Geng, X., Haarnoja, T., Levine, S.: Dynamical distance learning for semi-supervised and unsupervised skill discovery. In: International Conference on Learning Representations (2019) [15] Sharma, A., Gu, S., Levine, S., Kumar, V., Hausman, K.: Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657 (2019) [16] Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Hoang, C., Sohn, S., Choi, J., Carvalho, W., Lee, H.: Successor feature landmarks for long-horizon goal-conditioned reinforcement learning. Advances in Neural Information Processing Systems 34 (2021) [14] Hartikainen, K., Geng, X., Haarnoja, T., Levine, S.: Dynamical distance learning for semi-supervised and unsupervised skill discovery. In: International Conference on Learning Representations (2019) [15] Sharma, A., Gu, S., Levine, S., Kumar, V., Hausman, K.: Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657 (2019) [16] Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Hartikainen, K., Geng, X., Haarnoja, T., Levine, S.: Dynamical distance learning for semi-supervised and unsupervised skill discovery. In: International Conference on Learning Representations (2019) [15] Sharma, A., Gu, S., Levine, S., Kumar, V., Hausman, K.: Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657 (2019) [16] Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Sharma, A., Gu, S., Levine, S., Kumar, V., Hausman, K.: Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657 (2019) [16] Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019)
  4. Florensa, C., Held, D., Geng, X., Abbeel, P.: Automatic goal generation for reinforcement learning agents. In: International Conference on Machine Learning, pp. 1515–1528 (2018). PMLR [6] Pong, V.H., Dalal, M., Lin, S., Nair, A., Bahl, S., Levine, S.: Skew-fit: State-covering self-supervised reinforcement learning. In: International Conference on Machine Learning (2020). PMLR [7] Pitis, S., Chan, H., Zhao, S., Stadie, B., Ba, J.: Maximum entropy gain exploration for long horizon multi-goal reinforcement learning. In: International Conference on Machine Learning, pp. 7750–7761 (2020). PMLR [8] Mendonca, R., Rybkin, O., Daniilidis, K., Hafner, D., Pathak, D.: Discovering and achieving goals with world models. In: ICML 2021 Workshop on Unsupervised Reinforcement Learning (2021) [9] Liu, M., Zhu, M.Z., Zhang, W.: Goal-conditioned reinforcement learning: Problems and solutions. In: International Joint Conference on Artificial Intelligence (IJCAI-22), pp. 5502–5511 (2022) [10] Florensa, C., Duan, Y., Abbeel, P.: Stochastic neural networks for hierarchical reinforcement learning. arXiv preprint arXiv:1704.03012 (2017) [11] Eysenbach, B., Gupta, A., Ibarz, J., Levine, S.: Diversity is all you need: Learning skills without a reward function. In: International Conference on Learning Representations (2019) [12] Campero, A., Raileanu, R., Küttler, H., Tenenbaum, J.B., Rocktäschel, T., Grefenstette, E.: Learning with AMIGo: Adversarially motivated intrinsic goals. arXiv preprint arXiv:2006.12122 (2020) [13] Hoang, C., Sohn, S., Choi, J., Carvalho, W., Lee, H.: Successor feature landmarks for long-horizon goal-conditioned reinforcement learning. Advances in Neural Information Processing Systems 34 (2021) [14] Hartikainen, K., Geng, X., Haarnoja, T., Levine, S.: Dynamical distance learning for semi-supervised and unsupervised skill discovery. In: International Conference on Learning Representations (2019) [15] Sharma, A., Gu, S., Levine, S., Kumar, V., Hausman, K.: Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657 (2019) [16] Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Pong, V.H., Dalal, M., Lin, S., Nair, A., Bahl, S., Levine, S.: Skew-fit: State-covering self-supervised reinforcement learning. In: International Conference on Machine Learning (2020). PMLR [7] Pitis, S., Chan, H., Zhao, S., Stadie, B., Ba, J.: Maximum entropy gain exploration for long horizon multi-goal reinforcement learning. In: International Conference on Machine Learning, pp. 7750–7761 (2020). PMLR [8] Mendonca, R., Rybkin, O., Daniilidis, K., Hafner, D., Pathak, D.: Discovering and achieving goals with world models. In: ICML 2021 Workshop on Unsupervised Reinforcement Learning (2021) [9] Liu, M., Zhu, M.Z., Zhang, W.: Goal-conditioned reinforcement learning: Problems and solutions. In: International Joint Conference on Artificial Intelligence (IJCAI-22), pp. 5502–5511 (2022) [10] Florensa, C., Duan, Y., Abbeel, P.: Stochastic neural networks for hierarchical reinforcement learning. arXiv preprint arXiv:1704.03012 (2017) [11] Eysenbach, B., Gupta, A., Ibarz, J., Levine, S.: Diversity is all you need: Learning skills without a reward function. In: International Conference on Learning Representations (2019) [12] Campero, A., Raileanu, R., Küttler, H., Tenenbaum, J.B., Rocktäschel, T., Grefenstette, E.: Learning with AMIGo: Adversarially motivated intrinsic goals. arXiv preprint arXiv:2006.12122 (2020) [13] Hoang, C., Sohn, S., Choi, J., Carvalho, W., Lee, H.: Successor feature landmarks for long-horizon goal-conditioned reinforcement learning. Advances in Neural Information Processing Systems 34 (2021) [14] Hartikainen, K., Geng, X., Haarnoja, T., Levine, S.: Dynamical distance learning for semi-supervised and unsupervised skill discovery. In: International Conference on Learning Representations (2019) [15] Sharma, A., Gu, S., Levine, S., Kumar, V., Hausman, K.: Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657 (2019) [16] Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Pitis, S., Chan, H., Zhao, S., Stadie, B., Ba, J.: Maximum entropy gain exploration for long horizon multi-goal reinforcement learning. In: International Conference on Machine Learning, pp. 7750–7761 (2020). PMLR [8] Mendonca, R., Rybkin, O., Daniilidis, K., Hafner, D., Pathak, D.: Discovering and achieving goals with world models. In: ICML 2021 Workshop on Unsupervised Reinforcement Learning (2021) [9] Liu, M., Zhu, M.Z., Zhang, W.: Goal-conditioned reinforcement learning: Problems and solutions. In: International Joint Conference on Artificial Intelligence (IJCAI-22), pp. 5502–5511 (2022) [10] Florensa, C., Duan, Y., Abbeel, P.: Stochastic neural networks for hierarchical reinforcement learning. arXiv preprint arXiv:1704.03012 (2017) [11] Eysenbach, B., Gupta, A., Ibarz, J., Levine, S.: Diversity is all you need: Learning skills without a reward function. In: International Conference on Learning Representations (2019) [12] Campero, A., Raileanu, R., Küttler, H., Tenenbaum, J.B., Rocktäschel, T., Grefenstette, E.: Learning with AMIGo: Adversarially motivated intrinsic goals. arXiv preprint arXiv:2006.12122 (2020) [13] Hoang, C., Sohn, S., Choi, J., Carvalho, W., Lee, H.: Successor feature landmarks for long-horizon goal-conditioned reinforcement learning. Advances in Neural Information Processing Systems 34 (2021) [14] Hartikainen, K., Geng, X., Haarnoja, T., Levine, S.: Dynamical distance learning for semi-supervised and unsupervised skill discovery. In: International Conference on Learning Representations (2019) [15] Sharma, A., Gu, S., Levine, S., Kumar, V., Hausman, K.: Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657 (2019) [16] Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Mendonca, R., Rybkin, O., Daniilidis, K., Hafner, D., Pathak, D.: Discovering and achieving goals with world models. In: ICML 2021 Workshop on Unsupervised Reinforcement Learning (2021) [9] Liu, M., Zhu, M.Z., Zhang, W.: Goal-conditioned reinforcement learning: Problems and solutions. In: International Joint Conference on Artificial Intelligence (IJCAI-22), pp. 5502–5511 (2022) [10] Florensa, C., Duan, Y., Abbeel, P.: Stochastic neural networks for hierarchical reinforcement learning. arXiv preprint arXiv:1704.03012 (2017) [11] Eysenbach, B., Gupta, A., Ibarz, J., Levine, S.: Diversity is all you need: Learning skills without a reward function. In: International Conference on Learning Representations (2019) [12] Campero, A., Raileanu, R., Küttler, H., Tenenbaum, J.B., Rocktäschel, T., Grefenstette, E.: Learning with AMIGo: Adversarially motivated intrinsic goals. arXiv preprint arXiv:2006.12122 (2020) [13] Hoang, C., Sohn, S., Choi, J., Carvalho, W., Lee, H.: Successor feature landmarks for long-horizon goal-conditioned reinforcement learning. Advances in Neural Information Processing Systems 34 (2021) [14] Hartikainen, K., Geng, X., Haarnoja, T., Levine, S.: Dynamical distance learning for semi-supervised and unsupervised skill discovery. In: International Conference on Learning Representations (2019) [15] Sharma, A., Gu, S., Levine, S., Kumar, V., Hausman, K.: Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657 (2019) [16] Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Liu, M., Zhu, M.Z., Zhang, W.: Goal-conditioned reinforcement learning: Problems and solutions. In: International Joint Conference on Artificial Intelligence (IJCAI-22), pp. 5502–5511 (2022) [10] Florensa, C., Duan, Y., Abbeel, P.: Stochastic neural networks for hierarchical reinforcement learning. arXiv preprint arXiv:1704.03012 (2017) [11] Eysenbach, B., Gupta, A., Ibarz, J., Levine, S.: Diversity is all you need: Learning skills without a reward function. In: International Conference on Learning Representations (2019) [12] Campero, A., Raileanu, R., Küttler, H., Tenenbaum, J.B., Rocktäschel, T., Grefenstette, E.: Learning with AMIGo: Adversarially motivated intrinsic goals. arXiv preprint arXiv:2006.12122 (2020) [13] Hoang, C., Sohn, S., Choi, J., Carvalho, W., Lee, H.: Successor feature landmarks for long-horizon goal-conditioned reinforcement learning. Advances in Neural Information Processing Systems 34 (2021) [14] Hartikainen, K., Geng, X., Haarnoja, T., Levine, S.: Dynamical distance learning for semi-supervised and unsupervised skill discovery. In: International Conference on Learning Representations (2019) [15] Sharma, A., Gu, S., Levine, S., Kumar, V., Hausman, K.: Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657 (2019) [16] Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Florensa, C., Duan, Y., Abbeel, P.: Stochastic neural networks for hierarchical reinforcement learning. arXiv preprint arXiv:1704.03012 (2017) [11] Eysenbach, B., Gupta, A., Ibarz, J., Levine, S.: Diversity is all you need: Learning skills without a reward function. In: International Conference on Learning Representations (2019) [12] Campero, A., Raileanu, R., Küttler, H., Tenenbaum, J.B., Rocktäschel, T., Grefenstette, E.: Learning with AMIGo: Adversarially motivated intrinsic goals. arXiv preprint arXiv:2006.12122 (2020) [13] Hoang, C., Sohn, S., Choi, J., Carvalho, W., Lee, H.: Successor feature landmarks for long-horizon goal-conditioned reinforcement learning. Advances in Neural Information Processing Systems 34 (2021) [14] Hartikainen, K., Geng, X., Haarnoja, T., Levine, S.: Dynamical distance learning for semi-supervised and unsupervised skill discovery. In: International Conference on Learning Representations (2019) [15] Sharma, A., Gu, S., Levine, S., Kumar, V., Hausman, K.: Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657 (2019) [16] Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Eysenbach, B., Gupta, A., Ibarz, J., Levine, S.: Diversity is all you need: Learning skills without a reward function. In: International Conference on Learning Representations (2019) [12] Campero, A., Raileanu, R., Küttler, H., Tenenbaum, J.B., Rocktäschel, T., Grefenstette, E.: Learning with AMIGo: Adversarially motivated intrinsic goals. arXiv preprint arXiv:2006.12122 (2020) [13] Hoang, C., Sohn, S., Choi, J., Carvalho, W., Lee, H.: Successor feature landmarks for long-horizon goal-conditioned reinforcement learning. Advances in Neural Information Processing Systems 34 (2021) [14] Hartikainen, K., Geng, X., Haarnoja, T., Levine, S.: Dynamical distance learning for semi-supervised and unsupervised skill discovery. In: International Conference on Learning Representations (2019) [15] Sharma, A., Gu, S., Levine, S., Kumar, V., Hausman, K.: Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657 (2019) [16] Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Campero, A., Raileanu, R., Küttler, H., Tenenbaum, J.B., Rocktäschel, T., Grefenstette, E.: Learning with AMIGo: Adversarially motivated intrinsic goals. arXiv preprint arXiv:2006.12122 (2020) [13] Hoang, C., Sohn, S., Choi, J., Carvalho, W., Lee, H.: Successor feature landmarks for long-horizon goal-conditioned reinforcement learning. Advances in Neural Information Processing Systems 34 (2021) [14] Hartikainen, K., Geng, X., Haarnoja, T., Levine, S.: Dynamical distance learning for semi-supervised and unsupervised skill discovery. In: International Conference on Learning Representations (2019) [15] Sharma, A., Gu, S., Levine, S., Kumar, V., Hausman, K.: Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657 (2019) [16] Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Hoang, C., Sohn, S., Choi, J., Carvalho, W., Lee, H.: Successor feature landmarks for long-horizon goal-conditioned reinforcement learning. Advances in Neural Information Processing Systems 34 (2021) [14] Hartikainen, K., Geng, X., Haarnoja, T., Levine, S.: Dynamical distance learning for semi-supervised and unsupervised skill discovery. In: International Conference on Learning Representations (2019) [15] Sharma, A., Gu, S., Levine, S., Kumar, V., Hausman, K.: Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657 (2019) [16] Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Hartikainen, K., Geng, X., Haarnoja, T., Levine, S.: Dynamical distance learning for semi-supervised and unsupervised skill discovery. In: International Conference on Learning Representations (2019) [15] Sharma, A., Gu, S., Levine, S., Kumar, V., Hausman, K.: Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657 (2019) [16] Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Sharma, A., Gu, S., Levine, S., Kumar, V., Hausman, K.: Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657 (2019) [16] Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019)
  5. Pong, V.H., Dalal, M., Lin, S., Nair, A., Bahl, S., Levine, S.: Skew-fit: State-covering self-supervised reinforcement learning. In: International Conference on Machine Learning (2020). PMLR [7] Pitis, S., Chan, H., Zhao, S., Stadie, B., Ba, J.: Maximum entropy gain exploration for long horizon multi-goal reinforcement learning. In: International Conference on Machine Learning, pp. 7750–7761 (2020). PMLR [8] Mendonca, R., Rybkin, O., Daniilidis, K., Hafner, D., Pathak, D.: Discovering and achieving goals with world models. In: ICML 2021 Workshop on Unsupervised Reinforcement Learning (2021) [9] Liu, M., Zhu, M.Z., Zhang, W.: Goal-conditioned reinforcement learning: Problems and solutions. In: International Joint Conference on Artificial Intelligence (IJCAI-22), pp. 5502–5511 (2022) [10] Florensa, C., Duan, Y., Abbeel, P.: Stochastic neural networks for hierarchical reinforcement learning. arXiv preprint arXiv:1704.03012 (2017) [11] Eysenbach, B., Gupta, A., Ibarz, J., Levine, S.: Diversity is all you need: Learning skills without a reward function. In: International Conference on Learning Representations (2019) [12] Campero, A., Raileanu, R., Küttler, H., Tenenbaum, J.B., Rocktäschel, T., Grefenstette, E.: Learning with AMIGo: Adversarially motivated intrinsic goals. arXiv preprint arXiv:2006.12122 (2020) [13] Hoang, C., Sohn, S., Choi, J., Carvalho, W., Lee, H.: Successor feature landmarks for long-horizon goal-conditioned reinforcement learning. Advances in Neural Information Processing Systems 34 (2021) [14] Hartikainen, K., Geng, X., Haarnoja, T., Levine, S.: Dynamical distance learning for semi-supervised and unsupervised skill discovery. In: International Conference on Learning Representations (2019) [15] Sharma, A., Gu, S., Levine, S., Kumar, V., Hausman, K.: Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657 (2019) [16] Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Pitis, S., Chan, H., Zhao, S., Stadie, B., Ba, J.: Maximum entropy gain exploration for long horizon multi-goal reinforcement learning. In: International Conference on Machine Learning, pp. 7750–7761 (2020). PMLR [8] Mendonca, R., Rybkin, O., Daniilidis, K., Hafner, D., Pathak, D.: Discovering and achieving goals with world models. In: ICML 2021 Workshop on Unsupervised Reinforcement Learning (2021) [9] Liu, M., Zhu, M.Z., Zhang, W.: Goal-conditioned reinforcement learning: Problems and solutions. In: International Joint Conference on Artificial Intelligence (IJCAI-22), pp. 5502–5511 (2022) [10] Florensa, C., Duan, Y., Abbeel, P.: Stochastic neural networks for hierarchical reinforcement learning. arXiv preprint arXiv:1704.03012 (2017) [11] Eysenbach, B., Gupta, A., Ibarz, J., Levine, S.: Diversity is all you need: Learning skills without a reward function. In: International Conference on Learning Representations (2019) [12] Campero, A., Raileanu, R., Küttler, H., Tenenbaum, J.B., Rocktäschel, T., Grefenstette, E.: Learning with AMIGo: Adversarially motivated intrinsic goals. arXiv preprint arXiv:2006.12122 (2020) [13] Hoang, C., Sohn, S., Choi, J., Carvalho, W., Lee, H.: Successor feature landmarks for long-horizon goal-conditioned reinforcement learning. Advances in Neural Information Processing Systems 34 (2021) [14] Hartikainen, K., Geng, X., Haarnoja, T., Levine, S.: Dynamical distance learning for semi-supervised and unsupervised skill discovery. In: International Conference on Learning Representations (2019) [15] Sharma, A., Gu, S., Levine, S., Kumar, V., Hausman, K.: Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657 (2019) [16] Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Mendonca, R., Rybkin, O., Daniilidis, K., Hafner, D., Pathak, D.: Discovering and achieving goals with world models. In: ICML 2021 Workshop on Unsupervised Reinforcement Learning (2021) [9] Liu, M., Zhu, M.Z., Zhang, W.: Goal-conditioned reinforcement learning: Problems and solutions. In: International Joint Conference on Artificial Intelligence (IJCAI-22), pp. 5502–5511 (2022) [10] Florensa, C., Duan, Y., Abbeel, P.: Stochastic neural networks for hierarchical reinforcement learning. arXiv preprint arXiv:1704.03012 (2017) [11] Eysenbach, B., Gupta, A., Ibarz, J., Levine, S.: Diversity is all you need: Learning skills without a reward function. In: International Conference on Learning Representations (2019) [12] Campero, A., Raileanu, R., Küttler, H., Tenenbaum, J.B., Rocktäschel, T., Grefenstette, E.: Learning with AMIGo: Adversarially motivated intrinsic goals. arXiv preprint arXiv:2006.12122 (2020) [13] Hoang, C., Sohn, S., Choi, J., Carvalho, W., Lee, H.: Successor feature landmarks for long-horizon goal-conditioned reinforcement learning. Advances in Neural Information Processing Systems 34 (2021) [14] Hartikainen, K., Geng, X., Haarnoja, T., Levine, S.: Dynamical distance learning for semi-supervised and unsupervised skill discovery. In: International Conference on Learning Representations (2019) [15] Sharma, A., Gu, S., Levine, S., Kumar, V., Hausman, K.: Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657 (2019) [16] Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Liu, M., Zhu, M.Z., Zhang, W.: Goal-conditioned reinforcement learning: Problems and solutions. In: International Joint Conference on Artificial Intelligence (IJCAI-22), pp. 5502–5511 (2022) [10] Florensa, C., Duan, Y., Abbeel, P.: Stochastic neural networks for hierarchical reinforcement learning. arXiv preprint arXiv:1704.03012 (2017) [11] Eysenbach, B., Gupta, A., Ibarz, J., Levine, S.: Diversity is all you need: Learning skills without a reward function. In: International Conference on Learning Representations (2019) [12] Campero, A., Raileanu, R., Küttler, H., Tenenbaum, J.B., Rocktäschel, T., Grefenstette, E.: Learning with AMIGo: Adversarially motivated intrinsic goals. arXiv preprint arXiv:2006.12122 (2020) [13] Hoang, C., Sohn, S., Choi, J., Carvalho, W., Lee, H.: Successor feature landmarks for long-horizon goal-conditioned reinforcement learning. Advances in Neural Information Processing Systems 34 (2021) [14] Hartikainen, K., Geng, X., Haarnoja, T., Levine, S.: Dynamical distance learning for semi-supervised and unsupervised skill discovery. In: International Conference on Learning Representations (2019) [15] Sharma, A., Gu, S., Levine, S., Kumar, V., Hausman, K.: Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657 (2019) [16] Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Florensa, C., Duan, Y., Abbeel, P.: Stochastic neural networks for hierarchical reinforcement learning. arXiv preprint arXiv:1704.03012 (2017) [11] Eysenbach, B., Gupta, A., Ibarz, J., Levine, S.: Diversity is all you need: Learning skills without a reward function. In: International Conference on Learning Representations (2019) [12] Campero, A., Raileanu, R., Küttler, H., Tenenbaum, J.B., Rocktäschel, T., Grefenstette, E.: Learning with AMIGo: Adversarially motivated intrinsic goals. arXiv preprint arXiv:2006.12122 (2020) [13] Hoang, C., Sohn, S., Choi, J., Carvalho, W., Lee, H.: Successor feature landmarks for long-horizon goal-conditioned reinforcement learning. Advances in Neural Information Processing Systems 34 (2021) [14] Hartikainen, K., Geng, X., Haarnoja, T., Levine, S.: Dynamical distance learning for semi-supervised and unsupervised skill discovery. In: International Conference on Learning Representations (2019) [15] Sharma, A., Gu, S., Levine, S., Kumar, V., Hausman, K.: Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657 (2019) [16] Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Eysenbach, B., Gupta, A., Ibarz, J., Levine, S.: Diversity is all you need: Learning skills without a reward function. In: International Conference on Learning Representations (2019) [12] Campero, A., Raileanu, R., Küttler, H., Tenenbaum, J.B., Rocktäschel, T., Grefenstette, E.: Learning with AMIGo: Adversarially motivated intrinsic goals. arXiv preprint arXiv:2006.12122 (2020) [13] Hoang, C., Sohn, S., Choi, J., Carvalho, W., Lee, H.: Successor feature landmarks for long-horizon goal-conditioned reinforcement learning. Advances in Neural Information Processing Systems 34 (2021) [14] Hartikainen, K., Geng, X., Haarnoja, T., Levine, S.: Dynamical distance learning for semi-supervised and unsupervised skill discovery. In: International Conference on Learning Representations (2019) [15] Sharma, A., Gu, S., Levine, S., Kumar, V., Hausman, K.: Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657 (2019) [16] Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Campero, A., Raileanu, R., Küttler, H., Tenenbaum, J.B., Rocktäschel, T., Grefenstette, E.: Learning with AMIGo: Adversarially motivated intrinsic goals. arXiv preprint arXiv:2006.12122 (2020) [13] Hoang, C., Sohn, S., Choi, J., Carvalho, W., Lee, H.: Successor feature landmarks for long-horizon goal-conditioned reinforcement learning. Advances in Neural Information Processing Systems 34 (2021) [14] Hartikainen, K., Geng, X., Haarnoja, T., Levine, S.: Dynamical distance learning for semi-supervised and unsupervised skill discovery. In: International Conference on Learning Representations (2019) [15] Sharma, A., Gu, S., Levine, S., Kumar, V., Hausman, K.: Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657 (2019) [16] Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Hoang, C., Sohn, S., Choi, J., Carvalho, W., Lee, H.: Successor feature landmarks for long-horizon goal-conditioned reinforcement learning. Advances in Neural Information Processing Systems 34 (2021) [14] Hartikainen, K., Geng, X., Haarnoja, T., Levine, S.: Dynamical distance learning for semi-supervised and unsupervised skill discovery. In: International Conference on Learning Representations (2019) [15] Sharma, A., Gu, S., Levine, S., Kumar, V., Hausman, K.: Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657 (2019) [16] Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Hartikainen, K., Geng, X., Haarnoja, T., Levine, S.: Dynamical distance learning for semi-supervised and unsupervised skill discovery. In: International Conference on Learning Representations (2019) [15] Sharma, A., Gu, S., Levine, S., Kumar, V., Hausman, K.: Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657 (2019) [16] Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Sharma, A., Gu, S., Levine, S., Kumar, V., Hausman, K.: Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657 (2019) [16] Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019)
  6. Pitis, S., Chan, H., Zhao, S., Stadie, B., Ba, J.: Maximum entropy gain exploration for long horizon multi-goal reinforcement learning. In: International Conference on Machine Learning, pp. 7750–7761 (2020). PMLR [8] Mendonca, R., Rybkin, O., Daniilidis, K., Hafner, D., Pathak, D.: Discovering and achieving goals with world models. In: ICML 2021 Workshop on Unsupervised Reinforcement Learning (2021) [9] Liu, M., Zhu, M.Z., Zhang, W.: Goal-conditioned reinforcement learning: Problems and solutions. In: International Joint Conference on Artificial Intelligence (IJCAI-22), pp. 5502–5511 (2022) [10] Florensa, C., Duan, Y., Abbeel, P.: Stochastic neural networks for hierarchical reinforcement learning. arXiv preprint arXiv:1704.03012 (2017) [11] Eysenbach, B., Gupta, A., Ibarz, J., Levine, S.: Diversity is all you need: Learning skills without a reward function. In: International Conference on Learning Representations (2019) [12] Campero, A., Raileanu, R., Küttler, H., Tenenbaum, J.B., Rocktäschel, T., Grefenstette, E.: Learning with AMIGo: Adversarially motivated intrinsic goals. arXiv preprint arXiv:2006.12122 (2020) [13] Hoang, C., Sohn, S., Choi, J., Carvalho, W., Lee, H.: Successor feature landmarks for long-horizon goal-conditioned reinforcement learning. Advances in Neural Information Processing Systems 34 (2021) [14] Hartikainen, K., Geng, X., Haarnoja, T., Levine, S.: Dynamical distance learning for semi-supervised and unsupervised skill discovery. In: International Conference on Learning Representations (2019) [15] Sharma, A., Gu, S., Levine, S., Kumar, V., Hausman, K.: Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657 (2019) [16] Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Mendonca, R., Rybkin, O., Daniilidis, K., Hafner, D., Pathak, D.: Discovering and achieving goals with world models. In: ICML 2021 Workshop on Unsupervised Reinforcement Learning (2021) [9] Liu, M., Zhu, M.Z., Zhang, W.: Goal-conditioned reinforcement learning: Problems and solutions. In: International Joint Conference on Artificial Intelligence (IJCAI-22), pp. 5502–5511 (2022) [10] Florensa, C., Duan, Y., Abbeel, P.: Stochastic neural networks for hierarchical reinforcement learning. arXiv preprint arXiv:1704.03012 (2017) [11] Eysenbach, B., Gupta, A., Ibarz, J., Levine, S.: Diversity is all you need: Learning skills without a reward function. In: International Conference on Learning Representations (2019) [12] Campero, A., Raileanu, R., Küttler, H., Tenenbaum, J.B., Rocktäschel, T., Grefenstette, E.: Learning with AMIGo: Adversarially motivated intrinsic goals. arXiv preprint arXiv:2006.12122 (2020) [13] Hoang, C., Sohn, S., Choi, J., Carvalho, W., Lee, H.: Successor feature landmarks for long-horizon goal-conditioned reinforcement learning. Advances in Neural Information Processing Systems 34 (2021) [14] Hartikainen, K., Geng, X., Haarnoja, T., Levine, S.: Dynamical distance learning for semi-supervised and unsupervised skill discovery. In: International Conference on Learning Representations (2019) [15] Sharma, A., Gu, S., Levine, S., Kumar, V., Hausman, K.: Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657 (2019) [16] Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Liu, M., Zhu, M.Z., Zhang, W.: Goal-conditioned reinforcement learning: Problems and solutions. In: International Joint Conference on Artificial Intelligence (IJCAI-22), pp. 5502–5511 (2022) [10] Florensa, C., Duan, Y., Abbeel, P.: Stochastic neural networks for hierarchical reinforcement learning. arXiv preprint arXiv:1704.03012 (2017) [11] Eysenbach, B., Gupta, A., Ibarz, J., Levine, S.: Diversity is all you need: Learning skills without a reward function. In: International Conference on Learning Representations (2019) [12] Campero, A., Raileanu, R., Küttler, H., Tenenbaum, J.B., Rocktäschel, T., Grefenstette, E.: Learning with AMIGo: Adversarially motivated intrinsic goals. arXiv preprint arXiv:2006.12122 (2020) [13] Hoang, C., Sohn, S., Choi, J., Carvalho, W., Lee, H.: Successor feature landmarks for long-horizon goal-conditioned reinforcement learning. Advances in Neural Information Processing Systems 34 (2021) [14] Hartikainen, K., Geng, X., Haarnoja, T., Levine, S.: Dynamical distance learning for semi-supervised and unsupervised skill discovery. In: International Conference on Learning Representations (2019) [15] Sharma, A., Gu, S., Levine, S., Kumar, V., Hausman, K.: Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657 (2019) [16] Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Florensa, C., Duan, Y., Abbeel, P.: Stochastic neural networks for hierarchical reinforcement learning. arXiv preprint arXiv:1704.03012 (2017) [11] Eysenbach, B., Gupta, A., Ibarz, J., Levine, S.: Diversity is all you need: Learning skills without a reward function. In: International Conference on Learning Representations (2019) [12] Campero, A., Raileanu, R., Küttler, H., Tenenbaum, J.B., Rocktäschel, T., Grefenstette, E.: Learning with AMIGo: Adversarially motivated intrinsic goals. arXiv preprint arXiv:2006.12122 (2020) [13] Hoang, C., Sohn, S., Choi, J., Carvalho, W., Lee, H.: Successor feature landmarks for long-horizon goal-conditioned reinforcement learning. Advances in Neural Information Processing Systems 34 (2021) [14] Hartikainen, K., Geng, X., Haarnoja, T., Levine, S.: Dynamical distance learning for semi-supervised and unsupervised skill discovery. In: International Conference on Learning Representations (2019) [15] Sharma, A., Gu, S., Levine, S., Kumar, V., Hausman, K.: Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657 (2019) [16] Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Eysenbach, B., Gupta, A., Ibarz, J., Levine, S.: Diversity is all you need: Learning skills without a reward function. In: International Conference on Learning Representations (2019) [12] Campero, A., Raileanu, R., Küttler, H., Tenenbaum, J.B., Rocktäschel, T., Grefenstette, E.: Learning with AMIGo: Adversarially motivated intrinsic goals. arXiv preprint arXiv:2006.12122 (2020) [13] Hoang, C., Sohn, S., Choi, J., Carvalho, W., Lee, H.: Successor feature landmarks for long-horizon goal-conditioned reinforcement learning. Advances in Neural Information Processing Systems 34 (2021) [14] Hartikainen, K., Geng, X., Haarnoja, T., Levine, S.: Dynamical distance learning for semi-supervised and unsupervised skill discovery. In: International Conference on Learning Representations (2019) [15] Sharma, A., Gu, S., Levine, S., Kumar, V., Hausman, K.: Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657 (2019) [16] Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Campero, A., Raileanu, R., Küttler, H., Tenenbaum, J.B., Rocktäschel, T., Grefenstette, E.: Learning with AMIGo: Adversarially motivated intrinsic goals. arXiv preprint arXiv:2006.12122 (2020) [13] Hoang, C., Sohn, S., Choi, J., Carvalho, W., Lee, H.: Successor feature landmarks for long-horizon goal-conditioned reinforcement learning. Advances in Neural Information Processing Systems 34 (2021) [14] Hartikainen, K., Geng, X., Haarnoja, T., Levine, S.: Dynamical distance learning for semi-supervised and unsupervised skill discovery. In: International Conference on Learning Representations (2019) [15] Sharma, A., Gu, S., Levine, S., Kumar, V., Hausman, K.: Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657 (2019) [16] Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Hoang, C., Sohn, S., Choi, J., Carvalho, W., Lee, H.: Successor feature landmarks for long-horizon goal-conditioned reinforcement learning. Advances in Neural Information Processing Systems 34 (2021) [14] Hartikainen, K., Geng, X., Haarnoja, T., Levine, S.: Dynamical distance learning for semi-supervised and unsupervised skill discovery. In: International Conference on Learning Representations (2019) [15] Sharma, A., Gu, S., Levine, S., Kumar, V., Hausman, K.: Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657 (2019) [16] Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Hartikainen, K., Geng, X., Haarnoja, T., Levine, S.: Dynamical distance learning for semi-supervised and unsupervised skill discovery. In: International Conference on Learning Representations (2019) [15] Sharma, A., Gu, S., Levine, S., Kumar, V., Hausman, K.: Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657 (2019) [16] Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Sharma, A., Gu, S., Levine, S., Kumar, V., Hausman, K.: Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657 (2019) [16] Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019)
  7. Mendonca, R., Rybkin, O., Daniilidis, K., Hafner, D., Pathak, D.: Discovering and achieving goals with world models. In: ICML 2021 Workshop on Unsupervised Reinforcement Learning (2021) [9] Liu, M., Zhu, M.Z., Zhang, W.: Goal-conditioned reinforcement learning: Problems and solutions. In: International Joint Conference on Artificial Intelligence (IJCAI-22), pp. 5502–5511 (2022) [10] Florensa, C., Duan, Y., Abbeel, P.: Stochastic neural networks for hierarchical reinforcement learning. arXiv preprint arXiv:1704.03012 (2017) [11] Eysenbach, B., Gupta, A., Ibarz, J., Levine, S.: Diversity is all you need: Learning skills without a reward function. In: International Conference on Learning Representations (2019) [12] Campero, A., Raileanu, R., Küttler, H., Tenenbaum, J.B., Rocktäschel, T., Grefenstette, E.: Learning with AMIGo: Adversarially motivated intrinsic goals. arXiv preprint arXiv:2006.12122 (2020) [13] Hoang, C., Sohn, S., Choi, J., Carvalho, W., Lee, H.: Successor feature landmarks for long-horizon goal-conditioned reinforcement learning. Advances in Neural Information Processing Systems 34 (2021) [14] Hartikainen, K., Geng, X., Haarnoja, T., Levine, S.: Dynamical distance learning for semi-supervised and unsupervised skill discovery. In: International Conference on Learning Representations (2019) [15] Sharma, A., Gu, S., Levine, S., Kumar, V., Hausman, K.: Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657 (2019) [16] Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Liu, M., Zhu, M.Z., Zhang, W.: Goal-conditioned reinforcement learning: Problems and solutions. In: International Joint Conference on Artificial Intelligence (IJCAI-22), pp. 5502–5511 (2022) [10] Florensa, C., Duan, Y., Abbeel, P.: Stochastic neural networks for hierarchical reinforcement learning. arXiv preprint arXiv:1704.03012 (2017) [11] Eysenbach, B., Gupta, A., Ibarz, J., Levine, S.: Diversity is all you need: Learning skills without a reward function. In: International Conference on Learning Representations (2019) [12] Campero, A., Raileanu, R., Küttler, H., Tenenbaum, J.B., Rocktäschel, T., Grefenstette, E.: Learning with AMIGo: Adversarially motivated intrinsic goals. arXiv preprint arXiv:2006.12122 (2020) [13] Hoang, C., Sohn, S., Choi, J., Carvalho, W., Lee, H.: Successor feature landmarks for long-horizon goal-conditioned reinforcement learning. Advances in Neural Information Processing Systems 34 (2021) [14] Hartikainen, K., Geng, X., Haarnoja, T., Levine, S.: Dynamical distance learning for semi-supervised and unsupervised skill discovery. In: International Conference on Learning Representations (2019) [15] Sharma, A., Gu, S., Levine, S., Kumar, V., Hausman, K.: Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657 (2019) [16] Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Florensa, C., Duan, Y., Abbeel, P.: Stochastic neural networks for hierarchical reinforcement learning. arXiv preprint arXiv:1704.03012 (2017) [11] Eysenbach, B., Gupta, A., Ibarz, J., Levine, S.: Diversity is all you need: Learning skills without a reward function. In: International Conference on Learning Representations (2019) [12] Campero, A., Raileanu, R., Küttler, H., Tenenbaum, J.B., Rocktäschel, T., Grefenstette, E.: Learning with AMIGo: Adversarially motivated intrinsic goals. arXiv preprint arXiv:2006.12122 (2020) [13] Hoang, C., Sohn, S., Choi, J., Carvalho, W., Lee, H.: Successor feature landmarks for long-horizon goal-conditioned reinforcement learning. Advances in Neural Information Processing Systems 34 (2021) [14] Hartikainen, K., Geng, X., Haarnoja, T., Levine, S.: Dynamical distance learning for semi-supervised and unsupervised skill discovery. In: International Conference on Learning Representations (2019) [15] Sharma, A., Gu, S., Levine, S., Kumar, V., Hausman, K.: Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657 (2019) [16] Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Eysenbach, B., Gupta, A., Ibarz, J., Levine, S.: Diversity is all you need: Learning skills without a reward function. In: International Conference on Learning Representations (2019) [12] Campero, A., Raileanu, R., Küttler, H., Tenenbaum, J.B., Rocktäschel, T., Grefenstette, E.: Learning with AMIGo: Adversarially motivated intrinsic goals. arXiv preprint arXiv:2006.12122 (2020) [13] Hoang, C., Sohn, S., Choi, J., Carvalho, W., Lee, H.: Successor feature landmarks for long-horizon goal-conditioned reinforcement learning. Advances in Neural Information Processing Systems 34 (2021) [14] Hartikainen, K., Geng, X., Haarnoja, T., Levine, S.: Dynamical distance learning for semi-supervised and unsupervised skill discovery. In: International Conference on Learning Representations (2019) [15] Sharma, A., Gu, S., Levine, S., Kumar, V., Hausman, K.: Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657 (2019) [16] Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Campero, A., Raileanu, R., Küttler, H., Tenenbaum, J.B., Rocktäschel, T., Grefenstette, E.: Learning with AMIGo: Adversarially motivated intrinsic goals. arXiv preprint arXiv:2006.12122 (2020) [13] Hoang, C., Sohn, S., Choi, J., Carvalho, W., Lee, H.: Successor feature landmarks for long-horizon goal-conditioned reinforcement learning. Advances in Neural Information Processing Systems 34 (2021) [14] Hartikainen, K., Geng, X., Haarnoja, T., Levine, S.: Dynamical distance learning for semi-supervised and unsupervised skill discovery. In: International Conference on Learning Representations (2019) [15] Sharma, A., Gu, S., Levine, S., Kumar, V., Hausman, K.: Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657 (2019) [16] Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Hoang, C., Sohn, S., Choi, J., Carvalho, W., Lee, H.: Successor feature landmarks for long-horizon goal-conditioned reinforcement learning. Advances in Neural Information Processing Systems 34 (2021) [14] Hartikainen, K., Geng, X., Haarnoja, T., Levine, S.: Dynamical distance learning for semi-supervised and unsupervised skill discovery. In: International Conference on Learning Representations (2019) [15] Sharma, A., Gu, S., Levine, S., Kumar, V., Hausman, K.: Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657 (2019) [16] Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Hartikainen, K., Geng, X., Haarnoja, T., Levine, S.: Dynamical distance learning for semi-supervised and unsupervised skill discovery. In: International Conference on Learning Representations (2019) [15] Sharma, A., Gu, S., Levine, S., Kumar, V., Hausman, K.: Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657 (2019) [16] Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Sharma, A., Gu, S., Levine, S., Kumar, V., Hausman, K.: Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657 (2019) [16] Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019)
  8. Liu, M., Zhu, M.Z., Zhang, W.: Goal-conditioned reinforcement learning: Problems and solutions. In: International Joint Conference on Artificial Intelligence (IJCAI-22), pp. 5502–5511 (2022) [10] Florensa, C., Duan, Y., Abbeel, P.: Stochastic neural networks for hierarchical reinforcement learning. arXiv preprint arXiv:1704.03012 (2017) [11] Eysenbach, B., Gupta, A., Ibarz, J., Levine, S.: Diversity is all you need: Learning skills without a reward function. In: International Conference on Learning Representations (2019) [12] Campero, A., Raileanu, R., Küttler, H., Tenenbaum, J.B., Rocktäschel, T., Grefenstette, E.: Learning with AMIGo: Adversarially motivated intrinsic goals. arXiv preprint arXiv:2006.12122 (2020) [13] Hoang, C., Sohn, S., Choi, J., Carvalho, W., Lee, H.: Successor feature landmarks for long-horizon goal-conditioned reinforcement learning. Advances in Neural Information Processing Systems 34 (2021) [14] Hartikainen, K., Geng, X., Haarnoja, T., Levine, S.: Dynamical distance learning for semi-supervised and unsupervised skill discovery. In: International Conference on Learning Representations (2019) [15] Sharma, A., Gu, S., Levine, S., Kumar, V., Hausman, K.: Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657 (2019) [16] Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Florensa, C., Duan, Y., Abbeel, P.: Stochastic neural networks for hierarchical reinforcement learning. arXiv preprint arXiv:1704.03012 (2017) [11] Eysenbach, B., Gupta, A., Ibarz, J., Levine, S.: Diversity is all you need: Learning skills without a reward function. In: International Conference on Learning Representations (2019) [12] Campero, A., Raileanu, R., Küttler, H., Tenenbaum, J.B., Rocktäschel, T., Grefenstette, E.: Learning with AMIGo: Adversarially motivated intrinsic goals. arXiv preprint arXiv:2006.12122 (2020) [13] Hoang, C., Sohn, S., Choi, J., Carvalho, W., Lee, H.: Successor feature landmarks for long-horizon goal-conditioned reinforcement learning. Advances in Neural Information Processing Systems 34 (2021) [14] Hartikainen, K., Geng, X., Haarnoja, T., Levine, S.: Dynamical distance learning for semi-supervised and unsupervised skill discovery. In: International Conference on Learning Representations (2019) [15] Sharma, A., Gu, S., Levine, S., Kumar, V., Hausman, K.: Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657 (2019) [16] Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Eysenbach, B., Gupta, A., Ibarz, J., Levine, S.: Diversity is all you need: Learning skills without a reward function. In: International Conference on Learning Representations (2019) [12] Campero, A., Raileanu, R., Küttler, H., Tenenbaum, J.B., Rocktäschel, T., Grefenstette, E.: Learning with AMIGo: Adversarially motivated intrinsic goals. arXiv preprint arXiv:2006.12122 (2020) [13] Hoang, C., Sohn, S., Choi, J., Carvalho, W., Lee, H.: Successor feature landmarks for long-horizon goal-conditioned reinforcement learning. Advances in Neural Information Processing Systems 34 (2021) [14] Hartikainen, K., Geng, X., Haarnoja, T., Levine, S.: Dynamical distance learning for semi-supervised and unsupervised skill discovery. In: International Conference on Learning Representations (2019) [15] Sharma, A., Gu, S., Levine, S., Kumar, V., Hausman, K.: Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657 (2019) [16] Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Campero, A., Raileanu, R., Küttler, H., Tenenbaum, J.B., Rocktäschel, T., Grefenstette, E.: Learning with AMIGo: Adversarially motivated intrinsic goals. arXiv preprint arXiv:2006.12122 (2020) [13] Hoang, C., Sohn, S., Choi, J., Carvalho, W., Lee, H.: Successor feature landmarks for long-horizon goal-conditioned reinforcement learning. Advances in Neural Information Processing Systems 34 (2021) [14] Hartikainen, K., Geng, X., Haarnoja, T., Levine, S.: Dynamical distance learning for semi-supervised and unsupervised skill discovery. In: International Conference on Learning Representations (2019) [15] Sharma, A., Gu, S., Levine, S., Kumar, V., Hausman, K.: Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657 (2019) [16] Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Hoang, C., Sohn, S., Choi, J., Carvalho, W., Lee, H.: Successor feature landmarks for long-horizon goal-conditioned reinforcement learning. Advances in Neural Information Processing Systems 34 (2021) [14] Hartikainen, K., Geng, X., Haarnoja, T., Levine, S.: Dynamical distance learning for semi-supervised and unsupervised skill discovery. In: International Conference on Learning Representations (2019) [15] Sharma, A., Gu, S., Levine, S., Kumar, V., Hausman, K.: Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657 (2019) [16] Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Hartikainen, K., Geng, X., Haarnoja, T., Levine, S.: Dynamical distance learning for semi-supervised and unsupervised skill discovery. In: International Conference on Learning Representations (2019) [15] Sharma, A., Gu, S., Levine, S., Kumar, V., Hausman, K.: Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657 (2019) [16] Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Sharma, A., Gu, S., Levine, S., Kumar, V., Hausman, K.: Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657 (2019) [16] Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019)
  9. Florensa, C., Duan, Y., Abbeel, P.: Stochastic neural networks for hierarchical reinforcement learning. arXiv preprint arXiv:1704.03012 (2017) [11] Eysenbach, B., Gupta, A., Ibarz, J., Levine, S.: Diversity is all you need: Learning skills without a reward function. In: International Conference on Learning Representations (2019) [12] Campero, A., Raileanu, R., Küttler, H., Tenenbaum, J.B., Rocktäschel, T., Grefenstette, E.: Learning with AMIGo: Adversarially motivated intrinsic goals. arXiv preprint arXiv:2006.12122 (2020) [13] Hoang, C., Sohn, S., Choi, J., Carvalho, W., Lee, H.: Successor feature landmarks for long-horizon goal-conditioned reinforcement learning. Advances in Neural Information Processing Systems 34 (2021) [14] Hartikainen, K., Geng, X., Haarnoja, T., Levine, S.: Dynamical distance learning for semi-supervised and unsupervised skill discovery. In: International Conference on Learning Representations (2019) [15] Sharma, A., Gu, S., Levine, S., Kumar, V., Hausman, K.: Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657 (2019) [16] Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Eysenbach, B., Gupta, A., Ibarz, J., Levine, S.: Diversity is all you need: Learning skills without a reward function. In: International Conference on Learning Representations (2019) [12] Campero, A., Raileanu, R., Küttler, H., Tenenbaum, J.B., Rocktäschel, T., Grefenstette, E.: Learning with AMIGo: Adversarially motivated intrinsic goals. arXiv preprint arXiv:2006.12122 (2020) [13] Hoang, C., Sohn, S., Choi, J., Carvalho, W., Lee, H.: Successor feature landmarks for long-horizon goal-conditioned reinforcement learning. Advances in Neural Information Processing Systems 34 (2021) [14] Hartikainen, K., Geng, X., Haarnoja, T., Levine, S.: Dynamical distance learning for semi-supervised and unsupervised skill discovery. In: International Conference on Learning Representations (2019) [15] Sharma, A., Gu, S., Levine, S., Kumar, V., Hausman, K.: Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657 (2019) [16] Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Campero, A., Raileanu, R., Küttler, H., Tenenbaum, J.B., Rocktäschel, T., Grefenstette, E.: Learning with AMIGo: Adversarially motivated intrinsic goals. arXiv preprint arXiv:2006.12122 (2020) [13] Hoang, C., Sohn, S., Choi, J., Carvalho, W., Lee, H.: Successor feature landmarks for long-horizon goal-conditioned reinforcement learning. Advances in Neural Information Processing Systems 34 (2021) [14] Hartikainen, K., Geng, X., Haarnoja, T., Levine, S.: Dynamical distance learning for semi-supervised and unsupervised skill discovery. In: International Conference on Learning Representations (2019) [15] Sharma, A., Gu, S., Levine, S., Kumar, V., Hausman, K.: Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657 (2019) [16] Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Hoang, C., Sohn, S., Choi, J., Carvalho, W., Lee, H.: Successor feature landmarks for long-horizon goal-conditioned reinforcement learning. Advances in Neural Information Processing Systems 34 (2021) [14] Hartikainen, K., Geng, X., Haarnoja, T., Levine, S.: Dynamical distance learning for semi-supervised and unsupervised skill discovery. In: International Conference on Learning Representations (2019) [15] Sharma, A., Gu, S., Levine, S., Kumar, V., Hausman, K.: Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657 (2019) [16] Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Hartikainen, K., Geng, X., Haarnoja, T., Levine, S.: Dynamical distance learning for semi-supervised and unsupervised skill discovery. In: International Conference on Learning Representations (2019) [15] Sharma, A., Gu, S., Levine, S., Kumar, V., Hausman, K.: Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657 (2019) [16] Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Sharma, A., Gu, S., Levine, S., Kumar, V., Hausman, K.: Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657 (2019) [16] Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019)
  10. Eysenbach, B., Gupta, A., Ibarz, J., Levine, S.: Diversity is all you need: Learning skills without a reward function. In: International Conference on Learning Representations (2019) [12] Campero, A., Raileanu, R., Küttler, H., Tenenbaum, J.B., Rocktäschel, T., Grefenstette, E.: Learning with AMIGo: Adversarially motivated intrinsic goals. arXiv preprint arXiv:2006.12122 (2020) [13] Hoang, C., Sohn, S., Choi, J., Carvalho, W., Lee, H.: Successor feature landmarks for long-horizon goal-conditioned reinforcement learning. Advances in Neural Information Processing Systems 34 (2021) [14] Hartikainen, K., Geng, X., Haarnoja, T., Levine, S.: Dynamical distance learning for semi-supervised and unsupervised skill discovery. In: International Conference on Learning Representations (2019) [15] Sharma, A., Gu, S., Levine, S., Kumar, V., Hausman, K.: Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657 (2019) [16] Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Campero, A., Raileanu, R., Küttler, H., Tenenbaum, J.B., Rocktäschel, T., Grefenstette, E.: Learning with AMIGo: Adversarially motivated intrinsic goals. arXiv preprint arXiv:2006.12122 (2020) [13] Hoang, C., Sohn, S., Choi, J., Carvalho, W., Lee, H.: Successor feature landmarks for long-horizon goal-conditioned reinforcement learning. Advances in Neural Information Processing Systems 34 (2021) [14] Hartikainen, K., Geng, X., Haarnoja, T., Levine, S.: Dynamical distance learning for semi-supervised and unsupervised skill discovery. In: International Conference on Learning Representations (2019) [15] Sharma, A., Gu, S., Levine, S., Kumar, V., Hausman, K.: Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657 (2019) [16] Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Hoang, C., Sohn, S., Choi, J., Carvalho, W., Lee, H.: Successor feature landmarks for long-horizon goal-conditioned reinforcement learning. Advances in Neural Information Processing Systems 34 (2021) [14] Hartikainen, K., Geng, X., Haarnoja, T., Levine, S.: Dynamical distance learning for semi-supervised and unsupervised skill discovery. In: International Conference on Learning Representations (2019) [15] Sharma, A., Gu, S., Levine, S., Kumar, V., Hausman, K.: Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657 (2019) [16] Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Hartikainen, K., Geng, X., Haarnoja, T., Levine, S.: Dynamical distance learning for semi-supervised and unsupervised skill discovery. In: International Conference on Learning Representations (2019) [15] Sharma, A., Gu, S., Levine, S., Kumar, V., Hausman, K.: Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657 (2019) [16] Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Sharma, A., Gu, S., Levine, S., Kumar, V., Hausman, K.: Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657 (2019) [16] Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019)
  11. Campero, A., Raileanu, R., Küttler, H., Tenenbaum, J.B., Rocktäschel, T., Grefenstette, E.: Learning with AMIGo: Adversarially motivated intrinsic goals. arXiv preprint arXiv:2006.12122 (2020) [13] Hoang, C., Sohn, S., Choi, J., Carvalho, W., Lee, H.: Successor feature landmarks for long-horizon goal-conditioned reinforcement learning. Advances in Neural Information Processing Systems 34 (2021) [14] Hartikainen, K., Geng, X., Haarnoja, T., Levine, S.: Dynamical distance learning for semi-supervised and unsupervised skill discovery. In: International Conference on Learning Representations (2019) [15] Sharma, A., Gu, S., Levine, S., Kumar, V., Hausman, K.: Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657 (2019) [16] Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Hoang, C., Sohn, S., Choi, J., Carvalho, W., Lee, H.: Successor feature landmarks for long-horizon goal-conditioned reinforcement learning. Advances in Neural Information Processing Systems 34 (2021) [14] Hartikainen, K., Geng, X., Haarnoja, T., Levine, S.: Dynamical distance learning for semi-supervised and unsupervised skill discovery. In: International Conference on Learning Representations (2019) [15] Sharma, A., Gu, S., Levine, S., Kumar, V., Hausman, K.: Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657 (2019) [16] Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Hartikainen, K., Geng, X., Haarnoja, T., Levine, S.: Dynamical distance learning for semi-supervised and unsupervised skill discovery. In: International Conference on Learning Representations (2019) [15] Sharma, A., Gu, S., Levine, S., Kumar, V., Hausman, K.: Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657 (2019) [16] Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Sharma, A., Gu, S., Levine, S., Kumar, V., Hausman, K.: Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657 (2019) [16] Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019)
  12. Hoang, C., Sohn, S., Choi, J., Carvalho, W., Lee, H.: Successor feature landmarks for long-horizon goal-conditioned reinforcement learning. Advances in Neural Information Processing Systems 34 (2021) [14] Hartikainen, K., Geng, X., Haarnoja, T., Levine, S.: Dynamical distance learning for semi-supervised and unsupervised skill discovery. In: International Conference on Learning Representations (2019) [15] Sharma, A., Gu, S., Levine, S., Kumar, V., Hausman, K.: Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657 (2019) [16] Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Hartikainen, K., Geng, X., Haarnoja, T., Levine, S.: Dynamical distance learning for semi-supervised and unsupervised skill discovery. In: International Conference on Learning Representations (2019) [15] Sharma, A., Gu, S., Levine, S., Kumar, V., Hausman, K.: Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657 (2019) [16] Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Sharma, A., Gu, S., Levine, S., Kumar, V., Hausman, K.: Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657 (2019) [16] Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019)
  13. Hartikainen, K., Geng, X., Haarnoja, T., Levine, S.: Dynamical distance learning for semi-supervised and unsupervised skill discovery. In: International Conference on Learning Representations (2019) [15] Sharma, A., Gu, S., Levine, S., Kumar, V., Hausman, K.: Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657 (2019) [16] Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Sharma, A., Gu, S., Levine, S., Kumar, V., Hausman, K.: Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657 (2019) [16] Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019)
  14. Sharma, A., Gu, S., Levine, S., Kumar, V., Hausman, K.: Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657 (2019) [16] Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019)
  15. Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i-Nieto, X., Torres, J.: Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International Conference on Machine Learning, pp. 1317–1327 (2020). PMLR [17] Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019)
  16. Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30 (2017) [18] Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019)
  17. Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015). PMLR [19] Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019)
  18. Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: International Joint Conference on Aritificial Intelligence, vol. 7, pp. 895–900 (2007) [20] Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019)
  19. Trott, A., Zheng, S., Xiong, C., Socher, R.: Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems 32 (2019) [21] Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019)
  20. Plappert, M., Andrychowicz, M., Ray, A., McGrew, B., Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej, M., Welinder, P., et al.: Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464 (2018) [22] Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019)
  21. Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 6292–6299 (2018). IEEE [23] Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019)
  22. Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C.M., Eysenbach, B., Levine, S.: Learning to reach goals via iterated supervised learning. In: International Conference on Learning Representations (2020) [24] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019)
  23. Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) [25] Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019)
  24. Pitis, S., Chan, H., Zhao, S.: mrl: modular RL. GitHub (2020) [26] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019)
  25. Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015). PMLR [27] Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019)
  26. Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019) [28] Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019)
  27. Sutton, R.S.: Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998) [29] Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019)
  28. Gehring, J., Synnaeve, G., Krause, A., Usunier, N.: Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems 34, 11553–11564 (2021) [30] Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019)
  29. Fruit, R., Lazaric, A.: Exploration-exploitation in MDPs with options. In: Artificial Intelligence and Statistics, pp. 576–584 (2017). PMLR [31] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019)
  30. Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) [32] Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019)
  31. Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) [33] Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019)
  32. Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, 832–837 (1956) [34] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019)
  33. Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019). PMLR [35] Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019)
  34. Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) [36] Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019) Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019)
  35. Lee, L., Eysenbach, B., Parisotto, E., Xing, E., Levine, S., Salakhutdinov, R.: Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019)
Citations (3)

Summary

We haven't generated a summary for this paper yet.