Interpretable Robotic Manipulation from Language (2405.17047v1)
Abstract: Humans naturally employ linguistic instructions to convey knowledge, a process that proves significantly more complex for machines, especially within the context of multitask robotic manipulation environments. Natural language, moreover, serves as the primary medium through which humans acquire new knowledge, presenting a potentially intuitive bridge for translating concepts understandable by humans into formats that can be learned by machines. In pursuit of facilitating this integration, we introduce an explainable behavior cloning agent, named Ex-PERACT, specifically designed for manipulation tasks. This agent is distinguished by its hierarchical structure, which incorporates natural language to enhance the learning process. At the top level, the model is tasked with learning a discrete skill code, while at the bottom level, the policy network translates the problem into a voxelized grid and maps the discretized actions to voxel grids. We evaluate our method across eight challenging manipulation tasks utilizing the RLBench benchmark, demonstrating that Ex-PERACT not only achieves competitive policy performance but also effectively bridges the gap between human instructions and machine execution in complex environments.
- Imitation learning by estimating expertise of demonstrators. In International Conference on Machine Learning, pages 1732–1748. PMLR, 2022.
- Modelling agent policies with interpretable imitation learning. In International Workshop on the Foundations of Trustworthy AI Integrating Learning, Optimization and Reasoning, pages 180–186. Springer, 2020.
- Safe imitation learning via fast bayesian reward inference from preferences. In International Conference on Machine Learning, pages 1165–1177. PMLR, 2020.
- D. Chen and R. Mooney. Learning to interpret natural language navigation instructions from observations. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 25, pages 859–865, 2011.
- End-to-end driving via conditional imitation learning. In 2018 IEEE international conference on robotics and automation (ICRA), pages 4693–4700. IEEE, 2018.
- An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
- Lisa: Learning interpretable skill abstractions from language. Advances in Neural Information Processing Systems, 35:21711–21724, 2022.
- Instruction-driven history-aware policies for robotic manipulations. In Conference on Robot Learning, pages 175–187. PMLR, 2023.
- Hierarchical few-shot imitation with skill transition models. In International Conference on Learning Representations, 2022.
- Perceiver io: A general architecture for structured inputs & outputs. arXiv preprint arXiv:2107.14795, 2021.
- Pyrep: Bringing v-rep to deep robot learning. arXiv preprint arXiv:1906.11176, 2019.
- Rlbench: The robot learning benchmark & learning environment. IEEE Robotics and Automation Letters, 5(2):3019–3026, 2020.
- Coarse-to-fine q-attention: Efficient learning for visual robotic manipulation via discretisation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13739–13748, 2022.
- Bc-z: Zero-shot task generalization with robotic imitation learning. In Conference on Robot Learning, pages 991–1002. PMLR, 2022.
- Interpretable skill learning for dynamic treatment regimes through imitation. In 2023 57th Annual Conference on Information Sciences and Systems (CISS), pages 1–6. IEEE, 2023.
- Language-driven representation learning for robotics. arXiv preprint arXiv:2302.12766, 2023.
- Hierarchical imitation and reinforcement learning. In International conference on machine learning, pages 2917–2926. PMLR, 2018.
- T. Leech. Explainable machine learning for task planning in robotics. PhD thesis, Massachusetts Institute of Technology, 2019.
- Infogail: Interpretable imitation learning from visual demonstrations. Advances in neural information processing systems, 30, 2017.
- C. Lynch and P. Sermanet. Language conditioned imitation learning over unstructured data. arXiv preprint arXiv:2005.07648, 2020.
- Learning language-conditioned robot behavior from offline data and crowd-sourced annotation. In Conference on Robot Learning, pages 1303–1315. PMLR, 2022.
- Self-imitation learning. In International conference on machine learning, pages 3878–3887. PMLR, 2018.
- Agile autonomous driving using end-to-end deep imitation learning. arXiv preprint arXiv:1709.07174, 2017.
- Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
- Conditional driving from natural language instructions. In Conference on Robot Learning, pages 540–551. PMLR, 2020.
- V-rep: A versatile and scalable robot simulation framework. In 2013 IEEE/RSJ international conference on intelligent robots and systems, pages 1321–1326. IEEE, 2013.
- Directed-info gail: Learning hierarchical policies from unsegmented demonstrations using directed information. arXiv preprint arXiv:1810.01266, 2018.
- Cliport: What and where pathways for robotic manipulation. In Conference on Robot Learning, pages 894–906. PMLR, 2022.
- Perceiver-actor: A multi-task transformer for robotic manipulation. In Conference on Robot Learning, pages 785–799. PMLR, 2023.
- Language-conditioned imitation learning for robot manipulation tasks. Advances in Neural Information Processing Systems, 33:13139–13150, 2020.
- Neural discrete representation learning. Advances in neural information processing systems, 30, 2017.
- Hierarchical imitation learning via subgoal representation learning for dynamic treatment recommendation. In Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining, pages 1081–1089, 2022.
- Robotic skill acquisition via instruction augmentation with vision-language models. arXiv preprint arXiv:2211.11736, 2022.
- Towards interpretable deep reinforcement learning models via inverse reinforcement learning. In 2022 26th International Conference on Pattern Recognition (ICPR), pages 5067–5074. IEEE, 2022.
- Large batch optimization for deep learning: Training bert in 76 minutes. arXiv preprint arXiv:1904.00962, 2019.
- One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557, 2018.
- Explainable hierarchical imitation learning for robotic drink pouring. IEEE Transactions on Automation Science and Engineering, 19(4):3871–3887, 2021.
- Z. Zhang and I. Paschalidis. Provable hierarchical imitation learning via em. In International Conference on Artificial Intelligence and Statistics, pages 883–891. PMLR, 2021.
- L. Zhou and K. Small. Inverse reinforcement learning with natural language goals. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 11116–11124, 2021.