GenCHiP: Generating Robot Policy Code for High-Precision and Contact-Rich Manipulation Tasks (2404.06645v1)
Abstract: LLMs have been successful at generating robot policy code, but so far these results have been limited to high-level tasks that do not require precise movement. It is an open question how well such approaches work for tasks that require reasoning over contact forces and working within tight success tolerances. We find that, with the right action space, LLMs are capable of successfully generating policies for a variety of contact-rich and high-precision manipulation tasks, even under noisy conditions, such as perceptual errors or grasping inaccuracies. Specifically, we reparameterize the action space to include compliance with constraints on the interaction forces and stiffnesses involved in reaching a target pose. We validate this approach on subtasks derived from the Functional Manipulation Benchmark (FMB) and NIST Task Board Benchmarks. Exposing this action space alongside methods for estimating object poses improves policy generation with an LLM by greater than 3x and 4x when compared to non-compliant action spaces
- “Code as Policies: Language Model Programs for Embodied Control” In arXiv preprint arXiv:2209.07753, 2022
- “Large Language Models as General Pattern Machines” In arXiv preprint, 2023
- “Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents” In arXiv preprint arXiv:2201.07207, 2022
- “Chatgpt for robotics: Design principles and model abilities” In arXiv preprint arXiv:2306.17582, 2023
- “Vima: General robot manipulation with multimodal prompts” In arXiv preprint arXiv:2210.03094, 2022
- Mohit Shridhar, Lucas Manuelli and Dieter Fox “Cliport: What and where pathways for robotic manipulation” In Conference on Robot Learning, 2022
- “Rt-1: Robotics transformer for real-world control at scale” In arXiv preprint arXiv:2212.06817, 2022
- “Rt-2: Vision-language-action models transfer web knowledge to robotic control” In arXiv preprint arXiv:2307.15818, 2023
- Mohit Shridhar, Lucas Manuelli and Dieter Fox “Perceiver-actor: A multi-task transformer for robotic manipulation” In Conference on Robot Learning, 2023
- “RT-H: Action Hierarchies Using Language”, 2024 arXiv:2403.01823 [cs.RO]
- “Efficient Online Learning of Contact Force Models for Connector Insertion” In arXiv preprint arXiv:2312.09190, 2023
- “FMB: A Functional Manipulation Benchmark for Generalizable Robotic Learning”, 2023
- “Research Challenges and Progress in Robotic Grasping and Manipulation Competitions” In IEEE Robotics and Automation Letters 7, 2021, pp. 874–881
- “Benchmarking Protocols for Evaluating Small Parts Robotic Assembly Systems” In IEEE Robotics and Automation Letters 5, 2020, pp. 883–889
- “Inner Monologue: Embodied Reasoning through Planning with Language Models” In arXiv preprint arXiv:2207.05608, 2022
- “Text2motion: From natural language instructions to feasible plans” In Autonomous Robots Springer, 2023
- “ProgPrompt: Generating Situated Robot Task Plans using Large Language Models” In 2023 IEEE International Conference on Robotics and Automation (ICRA), 2023
- “TidyBot: Personalized Robot Assistance with Large Language Models” In Autonomous Robots, 2023
- “Language-conditioned path planning” In Conference on Robot Learning, 2023, pp. 3384–3396 PMLR
- “Grounded Graph Decoding Improves Compositional Generalization in Question Answering” In ArXiv abs/2111.03642, 2021
- “VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models” In arXiv preprint arXiv:2307.05973, 2023
- “CALAMARI: Contact-Aware and Language conditioned spatial Action MApping for contact-RIch manipulation” In 7th Annual Conference on Robot Learning, 2023
- “Reinforcement Learning on Variable Impedance Controller for High-Precision Robotic Assembly” In 2019 International Conference on Robotics and Automation (ICRA), 2019 DOI: 10.1109/ICRA.2019.8793506
- “Offline Meta-Reinforcement Learning for Industrial Insertion” In 2022 International Conference on Robotics and Automation (ICRA), 2022
- “Factory: Fast contact for robotic assembly” In arXiv preprint arXiv:2205.03532, 2022
- “Robot learning towards smart robotic manufacturing: A review” In Robotics and Computer-Integrated Manufacturing 77 Elsevier, 2022, pp. 102360
- Oliver Kroemer, Scott Niekum and George Konidaris “A Review of Robot Learning for Manipulation: Challenges, Representations, and Algorithms” In J. Mach. Learn. Res. 22.1 JMLR.org, 2021
- Markku Suomalainen, Yiannis Karayiannidis and Ville Kyrki “A survey of robot manipulation in contact” In Robotics and Autonomous Systems 156 Elsevier, 2022, pp. 104224
- “A review on reinforcement learning for contact-rich robotic manipulation tasks” In Robotics and Computer-Integrated Manufacturing Elsevier, 2023
- “Vision-driven compliant manipulation for reliable, high-precision assembly tasks” In arXiv preprint arXiv:2106.14070, 2021
- “Residual Learning From Demonstration: Adapting DMPs for Contact-Rich Manipulation” In IEEE Robotics and Automation Letters, 2022
- “Symbolic State Estimation with Predicates for Contact-Rich Manipulation Tasks” In 2022 International Conference on Robotics and Automation (ICRA), 2022
- “Diffusion Policy: Visuomotor Policy Learning via Action Diffusion” In Proceedings of Robotics: Science and Systems (RSS), 2023
- “Deep Reinforcement Learning for Industrial Insertion Tasks with Visual Inputs and Natural Rewards” In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2019, pp. 5548–5555
- “Zero-Shot Transfer of Haptics-based Object Insertion Policies” In International Conference on Robotics and Automation (ICRA), 2023
- “Variable compliance control for robotic peg-in-hole assembly: A deep-reinforcement-learning approach” In Applied Sciences 10.19 MDPI, 2020, pp. 6923
- Fares J Abu-Dakka and Matteo Saveriano “Variable impedance control and learning—a review” In Frontiers in Robotics and AI 7 Frontiers Media SA, 2020, pp. 590681
- “Large Language Models are Zero-Shot Reasoners” In ArXiv abs/2205.11916, 2022
- “Variable Impedance Control in End-Effector Space: An Action Space for Reinforcement Learning in Contact-Rich Tasks” In International Conference on Intelligent Robots and Systems (IROS), 2019
- “Learning variable impedance control” In The International Journal of Robotics Research SAGE Publications Sage UK: London, England, 2011
- “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks” In Advances in Neural Information Processing Systems (NIPS), 2015
- “Language to Rewards for Robotic Skill Synthesis” In Arxiv preprint arXiv:2306.08647, 2023
- OpenAI “GPT-4 Technical Report”, 2023
- “Chain of Thought Prompting Elicits Reasoning in Large Language Models” In ArXiv abs/2201.11903, 2022
- “Feature Pyramid Networks for Object Detection” In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016
- “What Can Transformers Learn In-Context? A Case Study of Simple Function Classes” In arXiv preprint, 2022
- “Attention is All you Need” In Advances in Neural Information Processing Systems 30 Curran Associates, Inc., 2017 URL: https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf
- “An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion” In The Eleventh International Conference on Learning Representations, 2023 URL: https://openreview.net/forum?id=NAQvF08TcyG
- “Visual Instruction Tuning” In Thirty-seventh Conference on Neural Information Processing Systems, 2023 URL: https://openreview.net/forum?id=w0H2xGHlkw