Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 45 tok/s
Gemini 2.5 Pro 54 tok/s Pro
GPT-5 Medium 22 tok/s Pro
GPT-5 High 20 tok/s Pro
GPT-4o 99 tok/s Pro
Kimi K2 183 tok/s Pro
GPT OSS 120B 467 tok/s Pro
Claude Sonnet 4 38 tok/s Pro
2000 character limit reached

Behavioral Cloning via Search in Embedded Demonstration Dataset (2306.09082v1)

Published 15 Jun 2023 in cs.AI

Abstract: Behavioural cloning uses a dataset of demonstrations to learn a behavioural policy. To overcome various learning and policy adaptation problems, we propose to use latent space to index a demonstration dataset, instantly access similar relevant experiences, and copy behavior from these situations. Actions from a selected similar situation can be performed by the agent until representations of the agent's current situation and the selected experience diverge in the latent space. Thus, we formulate our control problem as a search problem over a dataset of experts' demonstrations. We test our approach on BASALT MineRL-dataset in the latent representation of a Video PreTraining model. We compare our model to state-of-the-art Minecraft agents. Our approach can effectively recover meaningful demonstrations and show human-like behavior of an agent in the Minecraft environment in a wide variety of scenarios. Experimental results reveal that performance of our search-based approach is comparable to trained models, while allowing zero-shot task adaptation by changing the demonstration examples.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (30)
  1. S. Schaal, “Learning from demonstration,” in Advances in Neural Information Processing Systems (M. Mozer, M. Jordan, and T. Petsche, eds.), vol. 9, MIT Press, 1996.
  2. R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction. The MIT Press, second ed., 2018.
  3. M. Schilling and A. Melnik, “An approach to hierarchical deep reinforcement learning for a decentralized walking control architecture,” in Biologically Inspired Cognitive Architectures 2018: Proceedings of the Ninth Annual Meeting of the BICA Society, pp. 272–282, Springer, 2019.
  4. N. Bach, A. Melnik, M. Schilling, T. Korthals, and H. Ritter, “Learn to move through a combination of policy gradient algorithms: Ddpg, d4pg, and td3,” in International Conference on Machine Learning, Optimization, and Data Science, pp. 631–644, Springer, 2020.
  5. M. Schilling, A. Melnik, F. W. Ohl, H. J. Ritter, and B. Hammer, “Decentralized control and local information for robust and adaptive decentralized deep reinforcement learning,” Neural Networks, vol. 144, pp. 699–725, 2021.
  6. L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. Wainwright, P. Mishkin, C. Zhang, S. Agarwal, K. Slama, A. Ray, et al., “Training language models to follow instructions with human feedback,” Advances in Neural Information Processing Systems, vol. 35, pp. 27730–27744, 2022.
  7. M. Oquab, T. Darcet, T. Moutakanni, H. Vo, M. Szafraniec, V. Khalidov, P. Fernandez, D. Haziza, F. Massa, A. El-Nouby, et al., “Dinov2: Learning robust visual features without supervision,” arXiv preprint arXiv:2304.07193, 2023.
  8. K. Rana, A. Melnik, and N. Sünderhauf, “Contrastive language, action, and state pre-training for robot learning,” arXiv preprint arXiv:2304.10782, 2023.
  9. A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, et al., “Learning transferable visual models from natural language supervision,” in International conference on machine learning, pp. 8748–8763, PMLR, 2021.
  10. B. Baker, I. Akkaya, P. Zhokov, J. Huizinga, J. Tang, A. Ecoffet, B. Houghton, R. Sampedro, and J. Clune, “Video pretraining (vpt): Learning to act by watching unlabeled online videos,” Advances in Neural Information Processing Systems, vol. 35, pp. 24639–24654, 2022.
  11. S. Beohar, F. Heinrich, R. Kala, H. Ritter, and A. Melnik, “Solving learn-to-race autonomous racing challenge by planning in latent space,” arXiv preprint arXiv:2207.01275, 2022.
  12. R. Shah, C. Wild, S. H. Wang, N. Alex, B. Houghton, W. H. Guss, S. P. Mohanty, A. Kanervisto, S. Milani, N. Topin, P. Abbeel, S. Russell, and A. D. Dragan, “The minerl BASALT competition on learning from human feedback,” CoRR, vol. abs/2107.01969, 2021.
  13. S. Milani, A. Kanervisto, K. Ramanauskas, S. Schulhoff, , B. Houghton, S. Mohanty, B. Galbraith, K. Chen, Y. Song, T. Zhou, B. Yu, H. Liu, K. Guan, Y. Hu, T. Lv, F. Malato, F. Leopold, A. Raut, V. Hautamäki, A. Melnik, S. Ishida, J. F. Henriques, R. Klassert, W. Laurito, E. Novoseller, V. G. Goecks, N. Waytowich, D. Watkins, J. Miller, and R. Shah, “A retrospective of the minerl basalt 2022 competition]Towards Solving Fuzzy Tasks with Human Feedback: A Retrospective of the MineRL BASALT 2022 Competition,” arXiv, 2023.
  14. S. K. Saksena, B. Navaneethkrishnan, S. Hegde, P. Raja, and R. M. Vishwanath, “Towards behavioural cloning for autonomous driving,” 2019.
  15. T. V. Samak, C. V. Samak, and S. Kandhasamy, “Robust behavioral cloning for autonomous vehicles using end-to-end imitation learning,” SAE International Journal of Connected and Automated Vehicles, vol. 4, 2021.
  16. S. Beohar and A. Melnik, “Planning with rl and episodic-memory behavioral priors,” arXiv preprint arXiv:2207.01845, 2022.
  17. A. Kanervisto, J. Karttunen, and V. Hautamäki, “Playing minecraft with behavioural cloning,” CoRR, vol. abs/2005.03374, 2020.
  18. A. Kanervisto, J. Pussinen, and V. Hautamaki, “Benchmarking End-to-End Behavioural Cloning on Video Games,” in IEEE Conference on Computatonal Intelligence and Games, CIG, vol. 2020-August, 2020.
  19. O. Vinyals, I. Babuschkin, W. M. Czarnecki, M. Mathieu, A. Dudzik, J. Chung, D. H. Choi, R. Powell, T. Ewalds, P. Georgiev, J. Oh, D. Horgan, M. Kroiss, I. Danihelka, A. Huang, L. Sifre, T. Cai, J. P. Agapiou, M. Jaderberg, A. S. Vezhnevets, R. Leblond, T. Pohlen, V. Dalibard, D. Budden, Y. Sulsky, J. Molloy, T. L. Paine, C. Gulcehre, Z. Wang, T. Pfaff, Y. Wu, R. Ring, D. Yogatama, D. Wünsch, K. McKinney, O. Smith, T. Schaul, T. Lillicrap, K. Kavukcuoglu, D. Hassabis, C. Apps, and D. Silver, “Grandmaster level in starcraft ii using multi-agent reinforcement learning,” Nature, vol. 575, 2019.
  20. P. de Haan, D. Jayaraman, and S. Levine, “Causal confusion in imitation learning,” vol. 32, 2019.
  21. A. Y. Ng and S. J. Russell, “Algorithms for inverse reinforcement learning,” in Proceedings of the Seventeenth International Conference on Machine Learning, ICML ’00, (San Francisco, CA, USA), p. 663–670, Morgan Kaufmann Publishers Inc., 2000.
  22. J. Ho and S. Ermon, “Generative adversarial imitation learning,” 2016.
  23. L. Espeholt, H. Soyer, R. Munos, K. Simonyan, V. Mnih, T. Ward, Y. Doron, V. Firoiu, T. Harley, I. Dunning, S. Legg, and K. Kavukcuoglu, “IMPALA: scalable distributed deep-rl with importance weighted actor-learner architectures,” CoRR, vol. abs/1802.01561, 2018.
  24. “Video-pre-training.” https://github.com/openai/Video-Pre-Training/tree/main/lib. Accessed: 2023-05-14.
  25. S. Russell, Human Compatible. Penguin, 2019.
  26. L. Fan, G. Wang, Y. Jiang, A. Mandlekar, Y. Yang, H. Zhu, A. Tang, D.-A. Huang, Y. Zhu, and A. Anandkumar, “Minedojo: Building open-ended embodied agents with internet-scale knowledge,” in Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2022.
  27. W. H. Guss, B. Houghton, N. Topin, P. Wang, C. Codel, M. Veloso, and R. Salakhutdinov, “MinerL: A large-scale dataset of minecraft demonstrations,” in IJCAI International Joint Conference on Artificial Intelligence, vol. 2019-August, 2019.
  28. A. Kanervisto, S. Milani, K. Ramanauskas, N. Topin, Z. Lin, J. Li, J. Shi, D. Ye, Q. Fu, W. Yang, W. Hong, Z. Huang, H. Chen, G. Zeng, Y. Lin, V. Micheli, E. Alonso, F. Fleuret, A. Nikulin, Y. Belousov, O. Svidchenko, and A. Shpilman, “Minerl diamond 2021 competition: Overview, results, and lessons learned,” 2022.
  29. L. V. D. Maaten and G. Hinton, “Visualizing data using t-sne,” Journal of Machine Learning Research, vol. 9, 2008.
  30. T. Minka, R. Cleven, and Y. Zaykov, “Trueskill 2: An improved bayesian skill rating system,” Tech. Rep. MSR-TR-2018-8, Microsoft, March 2018.
Citations (3)

Summary

We haven't generated a summary for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Lightbulb On Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube