"Set It Up!": Functional Object Arrangement with Compositional Generative Models (2405.11928v3)
Abstract: This paper studies the challenge of developing robots capable of understanding under-specified instructions for creating functional object arrangements, such as "set up a dining table for two"; previous arrangement approaches have focused on much more explicit instructions, such as "put object A on the table." We introduce a framework, SetItUp, for learning to interpret under-specified instructions. SetItUp takes a small number of training examples and a human-crafted program sketch to uncover arrangement rules for specific scene types. By leveraging an intermediate graph-like representation of abstract spatial relationships among objects, SetItUp decomposes the arrangement problem into two subproblems: i) learning the arrangement patterns from limited data and ii) grounding these abstract relationships into object poses. SetItUp leverages LLMs to propose the abstract spatial relationships among objects in novel scenes as the constraints to be satisfied; then, it composes a library of diffusion models associated with these abstract relationships to find object poses that satisfy the constraints. We validate our framework on a dataset comprising study desks, dining tables, and coffee tables, with the results showing superior performance in generating physically plausible, functional, and aesthetically pleasing object arrangements compared to existing models.
- A Persistent Spatial Semantic Representation for High-Level Natural Language Instruction Execution. In CoRL, 2022.
- PyBullet, a Python module for physics simulation for games, robotics and machine learning. http://pybullet.org, 2016–2019.
- Object Rearrangement Using Learned Implicit Collision Functions. In ICRA, 2021.
- Task and Motion Planning with Large Language Models for Object Rearrangement. In IROS, 2023.
- Learning to Solve Sequential Physical Reasoning Problems from a Scene Image. IJRR, 40(12-14):1435–1466, 2021.
- Reduce, Reuse, Recycle: Compositional Generation with Energy-Based Diffusion Models and MCMC. In ICML, 2023.
- Energy-Based Models as Zero-Shot Planners for Compositional Scene Rearrangement. In RSS, 2023.
- Semantically Grounded Object Matching for Robust Robotic Scene Rearrangement. In ICRA, 2022.
- Denoising Diffusion Probabilistic Models. In NeurIPS, 2020.
- Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents. In ICML, 2022.
- Do as I Can, Not as I Say: Grounding Language in Robotic Affordances. In CoRL, 2022.
- Housekeep: Tidying Virtual Households Using Commonsense Reasoning. In ECCV, 2022.
- My House, My Rules: Learning Tidying Preferences with Graph Neural Networks. In CoRL, 2022.
- Dall-e-bot: Introducing web-scale diffusion models to robotics. IEEE Robotics and Automation Letters, 2023.
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations. IJCV, 123:32–73, 2017.
- Pre-Train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing. ACM CSUR, 55(9):1–35, 2023a.
- StructFormer: Learning Spatial Structure for Language-Guided Semantic Rearrangement of Novel Objects. In ICRA, 2022.
- StructDiffusion: Language-Guided Creation of Physically-Valid Structures using Unseen Objects. In RSS, 2023b.
- kPAM: Keypoint Affordances for Category-Level Robotic Manipulation. In ISRR, 2019.
- NERP: Neural Rearrangement Planning for Unknown Objects. In RSS, 2021.
- TIDEE: Tidying Up Novel Rooms Using Visuo-Semantic Commonsense Priors. In ECCV, 2022.
- ALFRED: A Benchmark for Interpreting Grounded Instructions for Everyday Tasks. In CVPR, 2020.
- A Long Horizon Planning Framework for Manipulating Rigid Pointcloud Objects. In CoRL, 2021.
- SE(3)-Equivariant Relational Rearrangement with Neural Descriptor Fields. In CoRL, 2023.
- Habitat 2.0: Training Home Assistants to Rearrange Their Habitat. In NeurIPS, 2021.
- U.S. Bureau of Labor Statistics. American Time Use Survey, 2023. URL https://www.bls.gov/tus/.
- Pascal Vincent. A Connection Between Score Matching and Denoising Autoencoders. Neural Comput., 23(7):1661–1674, 2011.
- LEGO-Net: Learning Regular Rearrangements of Objects in Rooms. In CVPR, 2023.
- Visual Room Rearrangement. In CVPR, 2021.
- TidyBot: Personalized Robot Assistance with Large Language Models. In IROS, 2023.
- TarGF: Learning Target Gradient Field to Rearrange Objects Without Explicit Goal Specification. In NeurIPS, 2022.
- How to Tidy Up a Table: Fusing Visual and Semantic Commonsense Reasoning for Robotic Tasks with Vague Objectives. arXiv preprint arXiv:2307.11319, 2023.
- Compositional Diffusion-Based Continuous Constraint Solvers. In CoRL, 2023.
- SORNet: Spatial Object-centric Representations for Sequential Manipulation. In CoRL, 2022.
- Parsel: Algorithmic Reasoning with Language Models by Composing Decompositions. In NeurIPS, 2023.
- Transporter Networks: Rearranging the Visual World for Robotic Manipulation. In CoRL, 2021.