Balancing Performance and Efficiency in Zero-shot Robotic Navigation (2406.03015v1)
Abstract: We present an optimization study of the Vision-Language Frontier Maps (VLFM) applied to the Object Goal Navigation task in robotics. Our work evaluates the efficiency and performance of various vision-LLMs, object detectors, segmentation models, and multi-modal comprehension and Visual Question Answering modules. Using the $\textit{val-mini}$ and $\textit{val}$ splits of Habitat-Matterport 3D dataset, we conduct experiments on a desktop with limited VRAM. We propose a solution that achieves a higher success rate (+1.55%) improving over the VLFM BLIP-2 baseline without substantial success-weighted path length loss while requiring $\textbf{2.3 times}$ less video memory. Our findings provide insights into balancing model performance and computational efficiency, suggesting effective deployment strategies for resource-limited environments.
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.