Can only LLMs do Reasoning?: Potential of Small Language Models in Task Planning (2404.03891v1)
Abstract: In robotics, the use of LLMs is becoming prevalent, especially for understanding human commands. In particular, LLMs are utilized as domain-agnostic task planners for high-level human commands. LLMs are capable of Chain-of-Thought (CoT) reasoning, and this allows LLMs to be task planners. However, we need to consider that modern robots still struggle to perform complex actions, and the domains where robots can be deployed are limited in practice. This leads us to pose a question: If small LMs can be trained to reason in chains within a single domain, would even small LMs be good task planners for the robots? To train smaller LMs to reason in chains, we build `COmmand-STeps datasets' (COST) consisting of high-level commands along with corresponding actionable low-level steps, via LLMs. We release not only our datasets but also the prompt templates used to generate them, to allow anyone to build datasets for their domain. We compare GPT3.5 and GPT4 with the finetuned GPT2 for task domains, in tabletop and kitchen environments, and the result shows that GPT2-medium is comparable to GPT3.5 for task planning in a specific domain. Our dataset, code, and more output samples can be found in https://github.com/Gawon-Choi/small-LMs-Task-Planning
- B. Zitkovich, T. Yu, S. Xu, P. Xu, T. Xiao, F. Xia, J. Wu, P. Wohlhart, S. Welker, A. Wahid et al., “RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control,” in Conference on Robot Learning. PMLR, 2023, pp. 2165–2183.
- A. Brohan, Y. Chebotar, C. Finn, K. Hausman, A. Herzog, D. Ho, J. Ibarz, A. Irpan, E. Jang, R. Julian et al., “Do As I Can, Not As I Say: Grounding Language in Robotic Affordances,” in Conference on robot learning. PMLR, 2023, pp. 287–318.
- J. Wei, X. Wang, D. Schuurmans, M. Bosma, F. Xia, E. Chi, Q. V. Le, D. Zhou et al., “Chain-of-Thought Prompting Elicits Reasoning in Large Language Models,” Advances in Neural Information Processing Systems, vol. 35, pp. 24 824–24 837, 2022.
- M. Suzgun, N. Scales, N. Schärli, S. Gehrmann, Y. Tay, H. W. Chung, A. Chowdhery, Q. Le, E. Chi, D. Zhou et al., “Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them,” in Findings of the Association for Computational Linguistics: ACL 2023, 2023, pp. 13 003–13 051.
- A. Chowdhery, S. Narang, J. Devlin, M. Bosma, G. Mishra, A. Roberts, P. Barham, H. W. Chung, C. Sutton, S. Gehrmann et al., “PaLM: Scaling Language Modeling with Pathways,” Journal of Machine Learning Research, vol. 24, no. 240, pp. 1–113, 2023.
- A. Srivastava, A. Rastogi, A. Rao, A. A. M. Shoeb, A. Abid, A. Fisch, A. R. Brown, A. Santoro, A. Gupta, A. Garriga-Alonso et al., “Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models,” Transactions on Machine Learning Research, 2023.
- L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. Wainwright, P. Mishkin, C. Zhang, S. Agarwal, K. Slama, A. Ray et al., “Training language models to follow instructions with human feedback,” Advances in Neural Information Processing Systems, vol. 35, pp. 27 730–27 744, 2022.
- C. Zhou, P. Liu, P. Xu, S. Iyer, J. Sun, Y. Mao, X. Ma, A. Efrat, P. Yu, L. Yu et al., “LIMA: Less Is More for Alignment,” Advances in Neural Information Processing Systems, vol. 36, 2024.
- W. Chen, S. Hu, R. Talak, and L. Carlone, “Leveraging Large (Visual) Language Models for Robot 3D Scene Understanding,” arXiv preprint arXiv:2209.05629, 2022.
- J. Park, S. Lim, J. Lee, S. Park, M. Chang, Y. Yu, and S. Choi, “CLARA: Classifying and Disambiguating User Commands for Reliable Interactive Robotic Agents,” IEEE Robotics and Automation Letters, 2023.
- Z. Mandi, S. Jain, and S. Song, “RoCo: Dialectic Multi-Robot Collaboration with Large Language Models,” arXiv preprint arXiv:2307.04738, 2023.
- S. Vemprala, R. Bonatti, A. Bucker, and A. Kapoor, “ChatGPT for Robotics: Design Principles and Model Abilities,” Microsoft Auton. Syst. Robot. Res, vol. 2, p. 20, 2023.
- J. Wu, R. Antonova, A. Kan, M. Lepert, A. Zeng, S. Song, J. Bohg, S. Rusinkiewicz, and T. Funkhouser, “TidyBot: personalized robot assistance with large language models,” in 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2023, pp. 3546–3553.
- I. Singh, V. Blukis, A. Mousavian, A. Goyal, D. Xu, J. Tremblay, D. Fox, J. Thomason, and A. Garg, “ProgPrompt: Generating Situated Robot Task Plans using Large Language Models,” in 2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2023, pp. 11 523–11 530.
- J. Ruan, Y. Chen, B. Zhang, Z. Xu, T. Bao, H. Mao, Z. Li, X. Zeng, R. Zhao et al., “TPTU: Task Planning and Tool Usage of Large Language Model-based AI Agents,” in NeurIPS 2023 Foundation Models for Decision Making Workshop, 2023.
- K. Rana, J. Haviland, S. Garg, J. Abou-Chakra, I. Reid, and N. Suenderhauf, “SayPlan: Grounding Large Language Models using 3D Scene Graphs for Scalable Robot Task Planning,” in Conference on Robot Learning. PMLR, 2023, pp. 23–72.
- Z. Wu, Z. Wang, X. Xu, J. Lu, and H. Yan, “Embodied Task Planning with Large Language Models,” arXiv preprint arXiv:2307.01848, 2023.
- Y. Ding, X. Zhang, C. Paxton, and S. Zhang, “Task and Motion Planning with Large Language Models for Object Rearrangement,” in 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2023, pp. 2086–2092.
- S. Narang, C. Raffel, K. Lee, A. Roberts, N. Fiedel, and K. Malkan, “WT5?! Training Text-to-Text Models to Explain their Predictions,” arXiv preprint arXiv:2004.14546, 2020.
- T. Kojima, S. S. Gu, M. Reid, Y. Matsuo, and Y. Iwasawa, “Large Language Models are Zero-Shot Reasoners,” Advances in neural information processing systems, vol. 35, pp. 22 199–22 213, 2022.
- A. Lampinen, I. Dasgupta, S. Chan, K. Mathewson, M. Tessler, A. Creswell, J. McClelland, J. Wang, and F. Hill, “Can language models learn from explanations in context?” in Findings of the Association for Computational Linguistics: EMNLP 2022, 2022, pp. 537–563.
- A. Zeng, M. Attarian, K. M. Choromanski, A. Wong, S. Welker, F. Tombari, A. Purohit, M. S. Ryoo, V. Sindhwani, J. Lee et al., “Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language,” in The Eleventh International Conference on Learning Representations, 2022.
- D. Driess, F. Xia, M. S. Sajjadi, C. Lynch, A. Chowdhery, B. Ichter, A. Wahid, J. Tompson, Q. Vuong, T. Yu et al., “PaLM-E: An Embodied Multimodal Language Model,” in Proceedings of the 40th International Conference on Machine Learning, 2023, pp. 8469–8488.
- L. C. Magister, J. Mallinson, J. Adamek, E. Malmi, and A. Severyn, “Teaching Small Language Models to Reason,” in Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 2023, pp. 1773–1781.
- M. Kang, S. Lee, J. Baek, K. Kawaguchi, and S. J. Hwang, “Knowledge-Augmented Reasoning Distillation for Small Language Models in Knowledge-Intensive Tasks,” Advances in Neural Information Processing Systems, vol. 36, 2024.
- Y. Ma, H. Jiang, and C. Fan, “Sci-CoT: Leveraging Large Language Models for Enhanced Knowledge Distillation in Small Models for Scientific QA,” arXiv preprint arXiv:2308.04679, 2023.
- Y. Fu, H. Peng, L. Ou, A. Sabharwal, and T. Khot, “Specializing Smaller Language Models towards Multi-Step Reasoning,” in International Conference on Machine Learning. PMLR, 2023, pp. 10 421–10 430.
- G. Hinton, O. Vinyals, and J. Dean, “Distilling the Knowledge in a Neural Network,” stat, vol. 1050, p. 9, 2015.
- C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li, and P. J. Liu, “Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer,” The Journal of Machine Learning Research, vol. 21, no. 1, pp. 5485–5551, 2020.
- A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, I. Sutskever et al., “Language Models are Unsupervised Multitask Learners,” OpenAI blog, vol. 1, no. 8, p. 9, 2019.
- J. White, Q. Fu, S. Hays, M. Sandborn, C. Olea, H. Gilbert, A. Elnashar, J. Spencer-Smith, and D. C. Schmidt, “A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT,” arXiv preprint arXiv:2302.11382, 2023.
- A. Brohan, N. Brown, J. Carbajal, Y. Chebotar, J. Dabis, C. Finn, K. Gopalakrishnan, K. Hausman, A. Herzog, J. Hsu et al., “RT-1: Robotics Transformer for Real-World Control at Scale,” arXiv preprint arXiv:2212.06817, 2022.
- P. Sermanet, T. Ding, J. Zhao, F. Xia, D. Dwibedi, K. Gopalakrishnan, C. Chan, G. Dulac-Arnold, N. Joshi, P. Florence et al., “RoboVQA: Multimodal Long-Horizon Reasoning for Robotics,” in NeurIPS 2023 Foundation Models for Decision Making Workshop, 2023.
- F. Ebert, Y. Yang, K. Schmeckpeper, B. Bucher, G. Georgakis, K. Daniilidis, C. Finn, and S. Levine, “Bridge Data: Boosting Generalization of Robotic Skills with Cross-Domain Datasets,” arXiv preprint arXiv:2109.13396, 2021.
- H. R. Walke, K. Black, T. Z. Zhao, Q. Vuong, C. Zheng, P. Hansen-Estruch, A. W. He, V. Myers, M. J. Kim, M. Du et al., “BridgeData V2: A Dataset for Robot Learning at Scale,” in Conference on Robot Learning. PMLR, 2023, pp. 1723–1736.
- C. Lynch, A. Wahid, J. Tompson, T. Ding, J. Betker, R. Baruch, T. Armstrong, and P. Florence, “Interactive Language: Talking to Robots in Real Time,” IEEE Robotics and Automation Letters, 2023.
- M. Shridhar, L. Manuelli, and D. Fox, “CLIPort: What and Where Pathways for Robotic Manipulation,” in Conference on Robot Learning. PMLR, 2022, pp. 894–906.