Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 134 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 27 tok/s Pro
GPT-5 High 22 tok/s Pro
GPT-4o 84 tok/s Pro
Kimi K2 195 tok/s Pro
GPT OSS 120B 433 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

A Robotic Skill Learning System Built Upon Diffusion Policies and Foundation Models (2403.16730v1)

Published 25 Mar 2024 in cs.RO

Abstract: In this paper, we build upon two major recent developments in the field, Diffusion Policies for visuomotor manipulation and large pre-trained multimodal foundational models to obtain a robotic skill learning system. The system can obtain new skills via the behavioral cloning approach of visuomotor diffusion policies given teleoperated demonstrations. Foundational models are being used to perform skill selection given the user's prompt in natural language. Before executing a skill the foundational model performs a precondition check given an observation of the workspace. We compare the performance of different foundational models to this end as well as give a detailed experimental evaluation of the skills taught by the user in simulation and the real world. Finally, we showcase the combined system on a challenging food serving scenario in the real world. Videos of all experimental executions, as well as the process of teaching new skills in simulation and the real world, are available on the project's website.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (22)
  1. “Deep long-tailed learning: A survey” In IEEE Transactions on Pattern Analysis and Machine Intelligence IEEE, 2023
  2. “Gpts are gpts: An early look at the labor market impact potential of large language models” In arXiv preprint arXiv:2303.10130, 2023
  3. “Skilldiffuser: Interpretable hierarchical planning via skill abstractions in diffusion-based task execution” In arXiv preprint arXiv:2312.11598, 2023
  4. “Rt-1: Robotics transformer for real-world control at scale” In arXiv preprint arXiv:2212.06817, 2022
  5. “Rt-2: Vision-language-action models transfer web knowledge to robotic control” In arXiv preprint arXiv:2307.15818, 2023
  6. “Language-Conditioned Robotic Manipulation with Fast and Slow Thinking” In arXiv preprint arXiv:2401.04181, 2024
  7. “Octopus: Embodied vision-language programmer from environmental feedback” In arXiv preprint arXiv:2310.08588, 2023
  8. “Diffusion Policy: Visuomotor Policy Learning via Action Diffusion” In Proceedings of Robotics: Science and Systems (RSS), 2023
  9. “Behavior transformers: Cloning k𝑘kitalic_k modes with one stone” In Advances in neural information processing systems 35, 2022, pp. 22955–22968
  10. Zipeng Fu, Tony Z. Zhao and Chelsea Finn “Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation” In arXiv, 2024
  11. “Movement Primitive Diffusion: Learning Gentle Robotic Manipulation of Deformable Objects”, 2023 arXiv:2312.10008 [cs.RO]
  12. “MOKA: Open-Vocabulary Robotic Manipulation through Mark-Based Visual Prompting”, 2024 arXiv:2403.03174 [cs.RO]
  13. “3D-VLA: A 3D Vision-Language-Action Generative World Model”, 2024 arXiv:2403.09631 [cs.CV]
  14. “3D-LLM: Injecting the 3D World into Large Language Models”, 2023 arXiv:2307.12981 [cs.CV]
  15. Lili Chen, Shikhar Bahl and Deepak Pathak “PlayFusion: Skill Acquisition via Diffusion from Language-Annotated Play”, 2023 arXiv:2312.04549 [cs.RO]
  16. “Quest2ROS: An App to Facilitate Teleoperating Robots” In 7th International Workshop on Virtual, Augmented, and Mixed-Reality for Human-Robot Interactions, 2024
  17. “Orbit: A unified simulation framework for interactive robot learning environments” In IEEE Robotics and Automation Letters IEEE, 2023
  18. “Comparing virtual reality interfaces for the teleoperation of robots” In 2020 Systems and Information Engineering Design Symposium (SIEDS), 2020, pp. 1–7 IEEE
  19. “A virtual reality framework for human-robot collaboration in cloth folding” In 2023 IEEE-RAS 22nd International Conference on Humanoid Robots (Humanoids), 2023, pp. 1–7 IEEE
  20. “GPT-4 Technical Report”, 2024 arXiv:2303.08774 [cs.CL]
  21. “Gemini: A Family of Highly Capable Multimodal Models”, 2023 arXiv:2312.11805 [cs.CL]
  22. “A Survey on Evaluation of Large Language Models” In ACM Trans. Intell. Syst. Technol. New York, NY, USA: Association for Computing Machinery, 2024 DOI: 10.1145/3641289
Citations (3)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube