Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Language-guided Active Sensing of Confined, Cluttered Environments via Object Rearrangement Planning (2402.02308v1)

Published 4 Feb 2024 in cs.RO

Abstract: Language-guided active sensing is a robotics subtask where a robot with an onboard sensor interacts efficiently with the environment via object manipulation to maximize perceptual information, following given language instructions. These tasks appear in various practical robotics applications, such as household service, search and rescue, and environment monitoring. Despite many applications, the existing works do not account for language instructions and have mainly focused on surface sensing, i.e., perceiving the environment from the outside without rearranging it for dense sensing. Therefore, in this paper, we introduce the first language-guided active sensing approach that allows users to observe specific parts of the environment via object manipulation. Our method spatially associates the environment with language instructions, determines the best camera viewpoints for perception, and then iteratively selects and relocates the best view-blocking objects to provide the dense perception of the region of interest. We evaluate our method against different baseline algorithms in simulation and also demonstrate it in real-world confined cabinet-like settings with multiple unknown objects. Our results show that the proposed method exhibits better performance across different metrics and successfully generalizes to real-world complex scenarios.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (30)
  1. F. Bourgault, T. Furukawa, and H. Durrant-Whyte, “Coordinated decentralized search for a lost target in a bayesian world,” in Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453), vol. 1, 2003, pp. 48–53 vol.1.
  2. G. A. Zachiotis, G. Andrikopoulos, R. Gornez, K. Nakamura, and G. Nikolakopoulos, “A survey on the application trends of home service robotics,” in 2018 IEEE International Conference on Robotics and Biomimetics (ROBIO), 2018, pp. 1999–2006.
  3. H. Umari and S. Mukhopadhyay, “Autonomous robotic exploration based on multiple rapidly-exploring randomized trees,” 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1396–1402, 2017. [Online]. Available: https://api.semanticscholar.org/CorpusID:31771638
  4. F. Niroui, K. Zhang, Z. Kashino, and G. Nejat, “Deep reinforcement learning robot for search and rescue applications: Exploration in unknown cluttered environments,” IEEE Robotics and Automation Letters, vol. 4, no. 2, pp. 610–617, 2019.
  5. S. Sarıel and H. L. Akın, “A novel search strategy for autonomous search and rescue robots,” in RoboCup 2004: Robot Soccer World Cup VIII, D. Nardi, M. Riedmiller, C. Sammut, and J. Santos-Victor, Eds.   Berlin, Heidelberg: Springer Berlin Heidelberg, 2005, pp. 459–466.
  6. H. Ren and A. H. Qureshi, “Robot active neural sensing and planning in unknown cluttered environments,” IEEE Transactions on Robotics, vol. 39, no. 4, pp. 2738–2750, 2023.
  7. L. Heng, A. Gotovos, A. Krause, and M. Pollefeys, “Efficient visual exploration and coverage with a micro aerial vehicle in unknown environments,” in 2015 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2015, pp. 1071–1078.
  8. H. H. González-Banos and J.-C. Latombe, “Navigation strategies for exploring indoor environments,” The International Journal of Robotics Research, vol. 21, no. 10-11, pp. 829–848, 2002.
  9. T. Cieslewski, E. Kaufmann, and D. Scaramuzza, “Rapid exploration with multi-rotors: A frontier selection method for high speed flight,” in 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).   IEEE, 2017, pp. 2135–2142.
  10. C. I. Connolly, “The determination of next best views,” Proceedings. 1985 IEEE International Conference on Robotics and Automation, vol. 2, pp. 432–435, 1985. [Online]. Available: https://api.semanticscholar.org/CorpusID:42940303
  11. J. I. Vasquez-Gomez, L. E. Sucar, and R. Murrieta-Cid, “View planning for 3d object reconstruction with a mobile manipulator robot,” 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 4227–4233, 2014. [Online]. Available: https://api.semanticscholar.org/CorpusID:14192829
  12. ——, “View/state planning for three-dimensional object reconstruction under uncertainty,” Autonomous Robots, vol. 41, pp. 89–109, 2017. [Online]. Available: https://api.semanticscholar.org/CorpusID:25430392
  13. S. Isler, R. Sabzevari, J. A. Delmerico, and D. Scaramuzza, “An information gain formulation for active volumetric 3d reconstruction,” 2016 IEEE International Conference on Robotics and Automation (ICRA), pp. 3477–3484, 2016. [Online]. Available: https://api.semanticscholar.org/CorpusID:15468346
  14. M. Mendoza, J. I. Vasquez-Gomez, H. Taud, L. E. Sucar, and C. Reta, “Supervised learning of the next-best-view for 3d object reconstruction,” Pattern Recognition Letters, vol. 133, pp. 224–231, 2020. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0167865518305531
  15. B. Hepp, D. Dey, S. N. Sinha, A. Kapoor, N. Joshi, and O. Hilliges, “Learn-to-score: Efficient 3d scene exploration by predicting view utility,” Cham, pp. 455–472, 2018.
  16. D. Gallos and F. P. Ferrie, “Active vision in the era of convolutional neural networks,” 2019 16th Conference on Computer and Robot Vision (CRV), pp. 81–88, 2019. [Online]. Available: https://api.semanticscholar.org/CorpusID:199441595
  17. T. Kollar, S. Tellex, D. K. Roy, and N. Roy, “Toward understanding natural language directions,” 2010 5th ACM/IEEE International Conference on Human-Robot Interaction (HRI), pp. 259–266, 2010. [Online]. Available: https://api.semanticscholar.org/CorpusID:276090
  18. M. Ahn, A. Brohan, N. Brown, Y. Chebotar, O. Cortes, B. David, C. Finn, K. Gopalakrishnan, K. Hausman, A. Herzog, D. Ho, J. Hsu, J. Ibarz, B. Ichter, A. Irpan, E. Jang, R. J. Ruano, K. Jeffrey, S. Jesmonth, N. J. Joshi, R. C. Julian, D. Kalashnikov, Y. Kuang, K.-H. Lee, S. Levine, Y. Lu, L. Luu, C. Parada, P. Pastor, J. Quiambao, K. Rao, J. Rettinghouse, D. M. Reyes, P. Sermanet, N. Sievers, C. Tan, A. Toshev, V. Vanhoucke, F. Xia, T. Xiao, P. Xu, S. Xu, and M. Yan, “Do as i can, not as i say: Grounding language in robotic affordances,” in Conference on Robot Learning, 2022. [Online]. Available: https://api.semanticscholar.org/CorpusID:247939706
  19. S. Tellex, T. Kollar, S. Dickerson, M. R. Walter, A. G. Banerjee, S. J. Teller, and N. Roy, “Understanding natural language commands for robotic navigation and mobile manipulation,” in AAAI Conference on Artificial Intelligence, 2011. [Online]. Available: https://api.semanticscholar.org/CorpusID:220828823
  20. Y. Kakiuchi, R. Ueda, K. Kobayashi, K. Okada, and M. Inaba, “Working with movable obstacles using on-line environment perception reconstruction using active sensing and color range sensor,” in 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2010, pp. 1696–1701.
  21. M. K. Mittal, D. Hoeller, F. Farshidian, M. Hutter, and A. Garg, “Articulated object interaction in unknown scenes with whole-body mobile manipulation,” 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1647–1654, 2021. [Online]. Available: https://api.semanticscholar.org/CorpusID:232290807
  22. T. Kollar, M. Laskey, K. Stone, B. Thananjeyan, and M. Tjersland, “Simnet: Enabling robust unknown object manipulation from pure synthetic data via stereo,” ArXiv, vol. abs/2106.16118, 2021. [Online]. Available: https://api.semanticscholar.org/CorpusID:235683469
  23. M. Honnibal, I. Montani, S. Van Landeghem, and A. Boyd, “spaCy: Industrial-strength Natural Language Processing in Python,” 2020.
  24. D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri, “Learning spatiotemporal features with 3d convolutional networks,” in 2015 IEEE International Conference on Computer Vision (ICCV).   Los Alamitos, CA, USA: IEEE Computer Society, dec 2015, pp. 4489–4497. [Online]. Available: https://doi.ieeecomputersociety.org/10.1109/ICCV.2015.510
  25. H. Bharadhwaj, K. Xie, and F. Shkurti, “Model-predictive control via cross-entropy and gradient-based optimization,” in Conference on Learning for Dynamics & Control, 2020. [Online]. Available: https://api.semanticscholar.org/CorpusID:215827996
  26. J. Kuffner and S. LaValle, “Rrt-connect: An efficient approach to single-query path planning,” in Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No.00CH37065), vol. 2, 2000, pp. 995–1001 vol.2.
  27. J. Liang, V. Makoviychuk, A. Handa, N. Chentanez, M. Macklin, and D. Fox, “Gpu-accelerated robotic simulation for distributed reinforcement learning,” in Conference on Robot Learning, 2018. [Online]. Available: https://api.semanticscholar.org/CorpusID:53084610
  28. IDEA-Research, “Idea-research/grounded-segment-anything: Grounded-sam: Marrying grounding dino with segment anything; stable diffusion; recognize anything - automatically detect , segment and generate anything.” [Online]. Available: https://github.com/IDEA-Research/Grounded-Segment-Anything
  29. A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y. Lo, P. Dollár, and R. B. Girshick, “Segment anything,” ArXiv, vol. abs/2304.02643, 2023. [Online]. Available: https://api.semanticscholar.org/CorpusID:257952310
  30. S. Liu, Z. Zeng, T. Ren, F. Li, H. Zhang, J. Yang, C. Li, J. Yang, H. Su, J. Zhu, and L. Zhang, “Grounding dino: Marrying dino with grounded pre-training for open-set object detection,” 2023.
Citations (1)

Summary

We haven't generated a summary for this paper yet.