Knolling Bot: Learning Robotic Object Arrangement from Tidy Demonstrations (2310.04566v2)
Abstract: Addressing the challenge of organizing scattered items in domestic spaces is complicated by the diversity and subjective nature of tidiness. Just as the complexity of human language allows for multiple expressions of the same idea, household tidiness preferences and organizational patterns vary widely, so presetting object locations would limit the adaptability to new objects and environments. Inspired by advancements in NLP, this paper introduces a self-supervised learning framework that allows robots to understand and replicate the concept of tidiness from demonstrations of well-organized layouts, akin to using conversational datasets to train LLMs(LLM). We leverage a transformer neural network to predict the placement of subsequent objects. We demonstrate a ``knolling'' system with a robotic arm and an RGB camera to organize items of varying sizes and quantities on a table. Our method not only trains a generalizable concept of tidiness, enabling the model to provide diverse solutions and adapt to different numbers of objects, but it can also incorporate human preferences to generate customized tidy tables without explicit target positions for each object.
- G. A. Zachiotis, G. Andrikopoulos, R. Gornez, K. Nakamura, and G. Nikolakopoulos, “A survey on the application trends of home service robotics,” in 2018 IEEE international conference on Robotics and Biomimetics (ROBIO). IEEE, 2018, pp. 1999–2006.
- J. Kim, A. K. Mishra, R. Limosani, M. Scafuro, N. Cauli, J. Santos-Victor, B. Mazzolai, and F. Cavallo, “Control strategies for cleaning robots in domestic applications: A comprehensive review,” International Journal of Advanced Robotic Systems, vol. 16, no. 4, p. 1729881419857432, 2019.
- J. Zhong, C. Ling, A. Cangelosi, A. Lotfi, and X. Liu, “On the gap between domestic robotic applications and computational intelligence,” Electronics, vol. 10, no. 7, p. 793, 2021.
- D. Batra, A. X. Chang, S. Chernova, A. J. Davison, J. Deng, V. Koltun, S. Levine, J. Malik, I. Mordatch, R. Mottaghi, et al., “Rearrangement: A challenge for embodied ai,” arXiv preprint arXiv:2011.01975, 2020.
- K. Ramachandruni, M. Zuo, and S. Chernova, “Consor: A context-aware semantic object rearrangement framework for partially arranged scenes,” in 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2023, pp. 82–89.
- K. Gao, J. Yu, T. S. Punjabi, and J. Yu, “Effectively rearranging heterogeneous objects on cluttered tabletops,” arXiv preprint arXiv:2306.14240, 2023.
- X. Lou, H. Yu, R. Worobel, Y. Yang, and C. Choi, “Adversarial object rearrangement in constrained environments with heterogeneous graph neural networks,” in 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2023, pp. 1008–1015.
- B. Chen, Y. Hu, R. Kwiatkowski, S. Song, and H. Lipson, “Visual perspective taking for opponent behavior modeling,” in 2021 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2021, pp. 13 678–13 685.
- B. Chen, Y. Hu, L. Li, S. Cummings, and H. Lipson, “Smile like you mean it: Driving animatronic robotic face with learned models,” in 2021 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2021, pp. 2739–2746.
- A. Zeng, S. Song, J. Lee, A. Rodriguez, and T. Funkhouser, “Tossingbot: Learning to throw arbitrary objects with residual physics,” IEEE Transactions on Robotics, vol. 36, no. 4, pp. 1307–1319, 2020.
- J. Achiam, S. Adler, S. Agarwal, L. Ahmad, I. Akkaya, F. L. Aleman, D. Almeida, J. Altenschmidt, S. Altman, S. Anadkat, et al., “Gpt-4 technical report,” arXiv preprint arXiv:2303.08774, 2023.
- A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, I. Sutskever, et al., “Language models are unsupervised multitask learners,” OpenAI blog, vol. 1, no. 8, p. 9, 2019.
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
- A. Gillioz, J. Casas, E. Mugellini, and O. Abou Khaled, “Overview of the transformer-based models for nlp tasks,” in 2020 15th Conference on Computer Science and Information Systems (FedCSIS). IEEE, 2020, pp. 179–183.
- D. A. Reynolds et al., “Gaussian mixture models.” Encyclopedia of biometrics, vol. 741, no. 659-663, 2009.
- D. Reis, J. Kupec, J. Hong, and A. Daoudi, “Real-time flying object detection with yolov8,” arXiv preprint arXiv:2305.09972, 2023.
- L. Berscheid, P. Meißner, and T. Kröger, “Robot learning of shifting objects for grasping in cluttered environments,” in 2019 IEEE/RSJ international conference on intelligent robots and systems (IROS). IEEE, 2019, pp. 612–618.
- Y. Zhu, J. Wong, A. Mandlekar, R. Martín-Martín, A. Joshi, S. Nasiriany, and Y. Zhu, “robosuite: A modular simulation framework and benchmark for robot learning,” arXiv preprint arXiv:2009.12293, 2020.
- D. Leidner, G. Bartels, W. Bejjani, A. Albu-Schäffer, and M. Beetz, “Cognition-enabled robotic wiping: Representation, planning, execution, and interpretation,” Robotics and Autonomous Systems, vol. 114, pp. 199–216, 2019.
- A. Kramberger, E. Shahriari, A. Gams, B. Nemec, A. Ude, and S. Haddadin, “Passivity based iterative learning of admittance-coupled dynamic movement primitives for interaction with changing environments,” in 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2018, pp. 6023–6028.
- W. Amanhoud, M. Khoramshahi, M. Bonnesoeur, and A. Billard, “Force adaptation in contact tasks with dynamical systems,” in 2020 IEEE International Conference on Robotics and Automation (ICRA), 2020, pp. 6841–6847.
- A. X. Lee, C. M. Devin, Y. Zhou, T. Lampe, K. Bousmalis, J. T. Springenberg, A. Byravan, A. Abdolmaleki, N. Gileadi, D. Khosid, et al., “Beyond pick-and-place: Tackling robotic stacking of diverse shapes,” in 5th Annual Conference on Robot Learning, 2021.
- F. Furrer, M. Wermelinger, H. Yoshida, F. Gramazio, M. Kohler, R. Siegwart, and M. Hutter, “Autonomous robotic stone stacking with online next best object target pose planning,” in 2017 IEEE International Conference on Robotics and Automation (ICRA), 2017, pp. 2350–2356.
- G. Schoettler, A. Nair, J. A. Ojea, S. Levine, and E. Solowjow, “Meta-reinforcement learning for robotic industrial insertion tasks,” in 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2020, pp. 9728–9735.
- A. Zeng, P. Florence, J. Tompson, S. Welker, J. Chien, M. Attarian, T. Armstrong, I. Krasin, D. Duong, V. Sindhwani, et al., “Transporter networks: Rearranging the visual world for robotic manipulation,” in Conference on Robot Learning. PMLR, 2021, pp. 726–747.
- A. Radford, K. Narasimhan, T. Salimans, I. Sutskever, et al., “Improving language understanding by generative pre-training,” 2018.
- S. Khan, M. Naseer, M. Hayat, S. W. Zamir, F. S. Khan, and M. Shah, “Transformers in vision: A survey,” ACM computing surveys (CSUR), vol. 54, no. 10s, pp. 1–41, 2022.
- W. Liu, C. Paxton, T. Hermans, and D. Fox, “Structformer: Learning spatial structure for language-guided semantic rearrangement of novel objects,” in 2022 International Conference on Robotics and Automation (ICRA). IEEE, 2022, pp. 6322–6329.
- I. Kapelyukh, V. Vosylius, and E. Johns, “Dall-e-bot: Introducing web-scale diffusion models to robotics,” IEEE Robotics and Automation Letters, 2023.
- Q. A. Wei, S. Ding, J. J. Park, R. Sajnani, A. Poulenard, S. Sridhar, and L. Guibas, “Lego-net: Learning regular rearrangements of objects in rooms,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 19 037–19 047.
- W. Liu, T. Hermans, S. Chernova, and C. Paxton, “Structdiffusion: Object-centric diffusion for semantic rearrangement of novel objects,” arXiv preprint arXiv:2211.04604, 2022.
- H. Kim, Y. Ohmura, and Y. Kuniyoshi, “Transformer-based deep imitation learning for dual-arm robot manipulation,” in 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2021, pp. 8965–8972.
- S. Dasari and A. Gupta, “Transformers for one-shot visual imitation,” in Conference on Robot Learning. PMLR, 2021, pp. 2071–2084.
- H. Ren and A. H. Qureshi, “Neural rearrangement planning for object retrieval from confined spaces perceivable by robot’s in-hand rgb-d sensor,” arXiv preprint arXiv:2402.06976, 2024.
- R. Jangir, N. Hansen, S. Ghosal, M. Jain, and X. Wang, “Look closer: Bridging egocentric and third-person views with transformers for robotic manipulation,” IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 3046–3053, 2022.
- Y. Zhu, A. Joshi, P. Stone, and Y. Zhu, “Viola: Object-centric imitation learning for vision-based robot manipulation,” in Conference on Robot Learning. PMLR, 2023, pp. 1199–1210.
- M. Shridhar, L. Manuelli, and D. Fox, “Perceiver-actor: A multi-task transformer for robotic manipulation,” in Conference on Robot Learning. PMLR, 2023, pp. 785–799.
- V. Jain, Y. Lin, E. Undersander, Y. Bisk, and A. Rai, “Transformers are adaptable task planners,” in Conference on Robot Learning. PMLR, 2023, pp. 1011–1037.
- W. Goodwin, S. Vaze, I. Havoutis, and I. Posner, “Semantically grounded object matching for robust robotic scene rearrangement,” in 2022 International Conference on Robotics and Automation (ICRA). IEEE, 2022, pp. 11 138–11 144.
- S. H. Cheong, B. Y. Cho, J. Lee, C. Kim, and C. Nam, “Where to relocate?: Object rearrangement inside cluttered and confined environments for robotic manipulation,” in 2020 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2020, pp. 7791–7797.
- M. Wu, F. Zhong, Y. Xia, and H. Dong, “Targf: Learning target gradient field to rearrange objects without explicit goal specification,” Advances in Neural Information Processing Systems, vol. 35, pp. 31 986–31 999, 2022.
- C. Wang, D. Xu, and L. Fei-Fei, “Generalizable task planning through representation pretraining,” IEEE Robotics and Automation Letters, vol. 7, no. 3, pp. 8299–8306, 2022.
- J. Liang, W. Huang, F. Xia, P. Xu, K. Hausman, B. Ichter, P. Florence, and A. Zeng, “Code as policies: Language model programs for embodied control,” in 2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2023, pp. 9493–9500.
- V. Blukis, C. Paxton, D. Fox, A. Garg, and Y. Artzi, “A persistent spatial semantic representation for high-level natural language instruction execution,” in Conference on Robot Learning. PMLR, 2022, pp. 706–717.
- M. Ahn, A. Brohan, N. Brown, Y. Chebotar, O. Cortes, B. David, C. Finn, C. Fu, K. Gopalakrishnan, K. Hausman, et al., “Do as i can, not as i say: Grounding language in robotic affordances,” arXiv preprint arXiv:2204.01691, 2022.
- I. Kapelyukh and E. Johns, “My house, my rules: Learning tidying preferences with graph neural networks,” in Conference on Robot Learning. PMLR, 2022, pp. 740–749.
- G. Sarch, Z. Fang, A. W. Harley, P. Schydlo, M. J. Tarr, S. Gupta, and K. Fragkiadaki, “Tidee: Tidying up novel rooms using visuo-semantic commonsense priors,” in European Conference on Computer Vision. Springer, 2022, pp. 480–496.
- Y. Kant, A. Ramachandran, S. Yenamandra, I. Gilitschenski, D. Batra, A. Szot, and H. Agrawal, “Housekeep: Tidying virtual households using commonsense reasoning,” in European Conference on Computer Vision. Springer, 2022, pp. 355–373.
- J. Tremblay, A. Prakash, D. Acuna, M. Brophy, V. Jampani, C. Anil, T. To, E. Cameracci, S. Boochoon, and S. Birchfield, “Training deep networks with synthetic data: Bridging the reality gap by domain randomization,” in Proceedings of the IEEE conference on computer vision and pattern recognition workshops, 2018, pp. 969–977.