AutoGPT+P: Affordance-based Task Planning with Large Language Models (2402.10778v2)
Abstract: Recent advances in task planning leverage LLMs to improve generalizability by combining such models with classical planning algorithms to address their inherent limitations in reasoning capabilities. However, these approaches face the challenge of dynamically capturing the initial state of the task planning problem. To alleviate this issue, we propose AutoGPT+P, a system that combines an affordance-based scene representation with a planning system. Affordances encompass the action possibilities of an agent on the environment and objects present in it. Thus, deriving the planning domain from an affordance-based scene representation allows symbolic planning with arbitrary objects. AutoGPT+P leverages this representation to derive and execute a plan for a task specified by the user in natural language. In addition to solving planning tasks under a closed-world assumption, AutoGPT+P can also handle planning with incomplete information, e. g., tasks with missing objects by exploring the scene, suggesting alternatives, or providing a partial plan. The affordance-based scene representation combines object detection with an automatically generated object-affordance-mapping using ChatGPT. The core planning tool extends existing work by automatically correcting semantic and syntactic errors. Our approach achieves a success rate of 98%, surpassing the current 81% success rate of the current state-of-the-art LLM-based planning method SayCan on the SayCan instruction set. Furthermore, we evaluated our approach on our newly created dataset with 150 scenarios covering a wide range of complex tasks with missing objects, achieving a success rate of 79% on our dataset. The dataset and the code are publicly available at https://git.h2t.iar.kit.edu/birr/autogpt-p-standalone.
- Do As I Can, Not As I Say: Grounding Language in Robotic Affordances. 2023.
- Armar-6: A high-performance humanoid for human-robot collaboration in real world scenarios. IEEE Robotics & Automation Magazine, 26(4):108–121, 2019.
- Affordance-based reasoning in robot task planning. In Planning and Robotics (PlanRob) Workshop ICAPS-2013, 2013.
- Finding Ways to Get the Job Done: An Affordance-Based Approach. International Conference on Automated Planning and Scheduling, 24(1):499–503, 2014.
- The role of functional affordances in socializing robots. International Journal of Social Robotics, 7:421–438, 2015.
- Gibsonian affordances for roboticists. Adaptive Behavior, 15(4):473–480, 2007.
- AutoTAMP: Autoregressive Task and Motion Planning with LLMs as Translators and Checkers. arXiv preprint arXiv:2306.06531, 2023.
- Toward affordance detection and ranking on novel objects for real-world robotic manipulation. IEEE Robotics & Automation Letters, 4(4):4070–4077, 2019a.
- Recognizing object affordances to support scene reasoning for manipulation tasks. arXiv preprint arXiv:1909.05770, 2019b.
- Integrating Action Knowledge and LLMs for Task Planning and Situation Handling in Open Worlds. arXiv preprint arXiv:2305.17590, 2023.
- K-vil: Keypoints-based visual imitation learning. IEEE Transactions on Robotics, 39(5):3888–3908, 2023.
- James J Gibson. The theory of affordances. Hilldale, USA, 1(2):67–82, 1977.
- CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing. arXiv preprint arXiv:2305.11738, 2023.
- Leveraging Pre-trained Large Language Models to Construct and Utilize World Models for Model-based Task Planning. arXiv preprint arXiv:2305.14909, 2023.
- M. Helmert. The Fast Downward Planning System. Journal of Artificial Intelligence Research, 26:191–246, 2006.
- Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents. arXiv preprint arXiv:2201.07207, 2022a.
- Inner Monologue: Embodied Reasoning through Planning with Language Models. arXiv preprint arXiv:2207.05608, 2022b.
- ultralytics/yolov5: v7.0 - YOLOv5 SOTA Realtime Instance Segmentation, November 2022. URL https://doi.org/10.5281/zenodo.7347926.
- Large Language Models Struggle to Learn Long-Tail Knowledge. In International Conference on Machine Learning, 2022.
- Interactive and incremental learning of spatial object relations from human demonstrations. Frontiers in Robotics & AI, 10:1–14, 2023.
- Semantic scene manipulation based on 3d spatial object relations and language instructions. In IEEE-RAS International Conference on Humanoid Robots, pages 306–313, 2021.
- Megapose: 6d pose estimation of novel objects via render & compare. arXiv preprint arXiv:2212.06870, 2022.
- Code as Policies: Language Model Programs for Embodied Control. arXiv preprint arXiv:2209.07753, 2023.
- Text2Motion: From Natural Language Instructions to Feasible Plans. arXiv preprint arXiv:2303.12153, 2023.
- LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:2304.11477, 2023.
- A review of methodologies for natural-language-facilitated human–robot cooperation. International Journal of Advanced Robotic Systems, 16(3):1729881419851402, 2019.
- Grounding planning operators by affordances. In International Conference on Cognitive Systems (CogSys), pages 79–84, 2008.
- Relational affordances for multiple-object manipulation. Autonomous Robots, 42:19–44, 2018.
- SayPlan: Grounding Large Language Models using 3D Scene Graphs for Scalable Task Planning. arXiv preprint arXiv:2307.06135, 2023.
- Evaluation of Pretrained Large Language Models in Embodied Planning Tasks. In Artificial General Intelligence, pages 222–232, 2023.
- ProgPrompt: Generating Situated Robot Task Plans using Large Language Models. In IEEE International Conference on Robotics and Automation, 2023.
- LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models. arXiv preprint arXiv:2212.04088, 2023.
- Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:2206.10498, 2023.
- ChatGPT Empowered Long-Step Robot Control in Various Environments: A Case Application. IEEE Access, 11:95060–95078, 2023.
- Robot learning and use of affordances in goal-directed tasks. In IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 2288–2294, 2013.
- Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents. arXiv preprint arXiv:2302.01560, 2023.
- Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. arXiv preprint arXiv:2201.11903, 2023.
- TidyBot: Personalized Robot Assistance with Large Language Models. arXiv preprint arXiv:2305.05658, 2023a.
- Embodied Task Planning with Large Language Models. arXiv preprint arXiv:2307.01848, 2023b.
- Translating Natural Language to Planning Goals with Large-Language Models. arXiv preprint arXiv:2302.05128, 2023.
- SGL: Symbolic Goal Learning in a Hybrid, Modular Framework for Human Instruction Following. IEEE Robotics & Automation Letters, 7(4):10375–10382, 2022.
- Large Language Models as Commonsense Knowledge for Large-Scale Task Planning. arXiv preprint arXiv:2305.14078, 2023.
- ISR-LLM: Iterative Self-Refined Large Language Model for Long-Horizon Sequential Task Planning. arXiv preprint arXiv:2308.13724, 2023.
- Timo Birr (1 paper)
- Christoph Pohl (8 papers)
- Abdelrahman Younes (4 papers)
- Tamim Asfour (62 papers)