Language-guided Active Sensing of Confined, Cluttered Environments via Object Rearrangement Planning (2402.02308v1)
Abstract: Language-guided active sensing is a robotics subtask where a robot with an onboard sensor interacts efficiently with the environment via object manipulation to maximize perceptual information, following given language instructions. These tasks appear in various practical robotics applications, such as household service, search and rescue, and environment monitoring. Despite many applications, the existing works do not account for language instructions and have mainly focused on surface sensing, i.e., perceiving the environment from the outside without rearranging it for dense sensing. Therefore, in this paper, we introduce the first language-guided active sensing approach that allows users to observe specific parts of the environment via object manipulation. Our method spatially associates the environment with language instructions, determines the best camera viewpoints for perception, and then iteratively selects and relocates the best view-blocking objects to provide the dense perception of the region of interest. We evaluate our method against different baseline algorithms in simulation and also demonstrate it in real-world confined cabinet-like settings with multiple unknown objects. Our results show that the proposed method exhibits better performance across different metrics and successfully generalizes to real-world complex scenarios.
- F. Bourgault, T. Furukawa, and H. Durrant-Whyte, “Coordinated decentralized search for a lost target in a bayesian world,” in Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453), vol. 1, 2003, pp. 48–53 vol.1.
- G. A. Zachiotis, G. Andrikopoulos, R. Gornez, K. Nakamura, and G. Nikolakopoulos, “A survey on the application trends of home service robotics,” in 2018 IEEE International Conference on Robotics and Biomimetics (ROBIO), 2018, pp. 1999–2006.
- H. Umari and S. Mukhopadhyay, “Autonomous robotic exploration based on multiple rapidly-exploring randomized trees,” 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1396–1402, 2017. [Online]. Available: https://api.semanticscholar.org/CorpusID:31771638
- F. Niroui, K. Zhang, Z. Kashino, and G. Nejat, “Deep reinforcement learning robot for search and rescue applications: Exploration in unknown cluttered environments,” IEEE Robotics and Automation Letters, vol. 4, no. 2, pp. 610–617, 2019.
- S. Sarıel and H. L. Akın, “A novel search strategy for autonomous search and rescue robots,” in RoboCup 2004: Robot Soccer World Cup VIII, D. Nardi, M. Riedmiller, C. Sammut, and J. Santos-Victor, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2005, pp. 459–466.
- H. Ren and A. H. Qureshi, “Robot active neural sensing and planning in unknown cluttered environments,” IEEE Transactions on Robotics, vol. 39, no. 4, pp. 2738–2750, 2023.
- L. Heng, A. Gotovos, A. Krause, and M. Pollefeys, “Efficient visual exploration and coverage with a micro aerial vehicle in unknown environments,” in 2015 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2015, pp. 1071–1078.
- H. H. González-Banos and J.-C. Latombe, “Navigation strategies for exploring indoor environments,” The International Journal of Robotics Research, vol. 21, no. 10-11, pp. 829–848, 2002.
- T. Cieslewski, E. Kaufmann, and D. Scaramuzza, “Rapid exploration with multi-rotors: A frontier selection method for high speed flight,” in 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2017, pp. 2135–2142.
- C. I. Connolly, “The determination of next best views,” Proceedings. 1985 IEEE International Conference on Robotics and Automation, vol. 2, pp. 432–435, 1985. [Online]. Available: https://api.semanticscholar.org/CorpusID:42940303
- J. I. Vasquez-Gomez, L. E. Sucar, and R. Murrieta-Cid, “View planning for 3d object reconstruction with a mobile manipulator robot,” 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 4227–4233, 2014. [Online]. Available: https://api.semanticscholar.org/CorpusID:14192829
- ——, “View/state planning for three-dimensional object reconstruction under uncertainty,” Autonomous Robots, vol. 41, pp. 89–109, 2017. [Online]. Available: https://api.semanticscholar.org/CorpusID:25430392
- S. Isler, R. Sabzevari, J. A. Delmerico, and D. Scaramuzza, “An information gain formulation for active volumetric 3d reconstruction,” 2016 IEEE International Conference on Robotics and Automation (ICRA), pp. 3477–3484, 2016. [Online]. Available: https://api.semanticscholar.org/CorpusID:15468346
- M. Mendoza, J. I. Vasquez-Gomez, H. Taud, L. E. Sucar, and C. Reta, “Supervised learning of the next-best-view for 3d object reconstruction,” Pattern Recognition Letters, vol. 133, pp. 224–231, 2020. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0167865518305531
- B. Hepp, D. Dey, S. N. Sinha, A. Kapoor, N. Joshi, and O. Hilliges, “Learn-to-score: Efficient 3d scene exploration by predicting view utility,” Cham, pp. 455–472, 2018.
- D. Gallos and F. P. Ferrie, “Active vision in the era of convolutional neural networks,” 2019 16th Conference on Computer and Robot Vision (CRV), pp. 81–88, 2019. [Online]. Available: https://api.semanticscholar.org/CorpusID:199441595
- T. Kollar, S. Tellex, D. K. Roy, and N. Roy, “Toward understanding natural language directions,” 2010 5th ACM/IEEE International Conference on Human-Robot Interaction (HRI), pp. 259–266, 2010. [Online]. Available: https://api.semanticscholar.org/CorpusID:276090
- M. Ahn, A. Brohan, N. Brown, Y. Chebotar, O. Cortes, B. David, C. Finn, K. Gopalakrishnan, K. Hausman, A. Herzog, D. Ho, J. Hsu, J. Ibarz, B. Ichter, A. Irpan, E. Jang, R. J. Ruano, K. Jeffrey, S. Jesmonth, N. J. Joshi, R. C. Julian, D. Kalashnikov, Y. Kuang, K.-H. Lee, S. Levine, Y. Lu, L. Luu, C. Parada, P. Pastor, J. Quiambao, K. Rao, J. Rettinghouse, D. M. Reyes, P. Sermanet, N. Sievers, C. Tan, A. Toshev, V. Vanhoucke, F. Xia, T. Xiao, P. Xu, S. Xu, and M. Yan, “Do as i can, not as i say: Grounding language in robotic affordances,” in Conference on Robot Learning, 2022. [Online]. Available: https://api.semanticscholar.org/CorpusID:247939706
- S. Tellex, T. Kollar, S. Dickerson, M. R. Walter, A. G. Banerjee, S. J. Teller, and N. Roy, “Understanding natural language commands for robotic navigation and mobile manipulation,” in AAAI Conference on Artificial Intelligence, 2011. [Online]. Available: https://api.semanticscholar.org/CorpusID:220828823
- Y. Kakiuchi, R. Ueda, K. Kobayashi, K. Okada, and M. Inaba, “Working with movable obstacles using on-line environment perception reconstruction using active sensing and color range sensor,” in 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2010, pp. 1696–1701.
- M. K. Mittal, D. Hoeller, F. Farshidian, M. Hutter, and A. Garg, “Articulated object interaction in unknown scenes with whole-body mobile manipulation,” 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1647–1654, 2021. [Online]. Available: https://api.semanticscholar.org/CorpusID:232290807
- T. Kollar, M. Laskey, K. Stone, B. Thananjeyan, and M. Tjersland, “Simnet: Enabling robust unknown object manipulation from pure synthetic data via stereo,” ArXiv, vol. abs/2106.16118, 2021. [Online]. Available: https://api.semanticscholar.org/CorpusID:235683469
- M. Honnibal, I. Montani, S. Van Landeghem, and A. Boyd, “spaCy: Industrial-strength Natural Language Processing in Python,” 2020.
- D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri, “Learning spatiotemporal features with 3d convolutional networks,” in 2015 IEEE International Conference on Computer Vision (ICCV). Los Alamitos, CA, USA: IEEE Computer Society, dec 2015, pp. 4489–4497. [Online]. Available: https://doi.ieeecomputersociety.org/10.1109/ICCV.2015.510
- H. Bharadhwaj, K. Xie, and F. Shkurti, “Model-predictive control via cross-entropy and gradient-based optimization,” in Conference on Learning for Dynamics & Control, 2020. [Online]. Available: https://api.semanticscholar.org/CorpusID:215827996
- J. Kuffner and S. LaValle, “Rrt-connect: An efficient approach to single-query path planning,” in Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No.00CH37065), vol. 2, 2000, pp. 995–1001 vol.2.
- J. Liang, V. Makoviychuk, A. Handa, N. Chentanez, M. Macklin, and D. Fox, “Gpu-accelerated robotic simulation for distributed reinforcement learning,” in Conference on Robot Learning, 2018. [Online]. Available: https://api.semanticscholar.org/CorpusID:53084610
- IDEA-Research, “Idea-research/grounded-segment-anything: Grounded-sam: Marrying grounding dino with segment anything; stable diffusion; recognize anything - automatically detect , segment and generate anything.” [Online]. Available: https://github.com/IDEA-Research/Grounded-Segment-Anything
- A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y. Lo, P. Dollár, and R. B. Girshick, “Segment anything,” ArXiv, vol. abs/2304.02643, 2023. [Online]. Available: https://api.semanticscholar.org/CorpusID:257952310
- S. Liu, Z. Zeng, T. Ren, F. Li, H. Zhang, J. Yang, C. Li, J. Yang, H. Su, J. Zhu, and L. Zhang, “Grounding dino: Marrying dino with grounded pre-training for open-set object detection,” 2023.