LOC-ZSON: Language-driven Object-Centric Zero-Shot Object Retrieval and Navigation (2405.05363v1)

Published 8 May 2024 in cs.CV and cs.RO

Abstract: In this paper, we present LOC-ZSON, a novel Language-driven Object-Centric image representation for object navigation task within complex scenes. We propose an object-centric image representation and corresponding losses for visual-LLM (VLM) fine-tuning, which can handle complex object-level queries. In addition, we design a novel LLM-based augmentation and prompt templates for stability during training and zero-shot inference. We implement our method on Astro robot and deploy it in both simulated and real-world environments for zero-shot object navigation. We show that our proposed method can achieve an improvement of 1.38 - 13.38% in terms of text-to-image recall on different benchmark settings for the retrieval task. For object navigation, we show the benefit of our approach in simulation and real world, showing 5% and 16.67% improvement in terms of navigation success rate, respectively.

References (35)

Authors (8)

Tianrui Guan (29 papers)
Yurou Yang (1 paper)
Harry Cheng (14 papers)
Muyuan Lin (1 paper)
Richard Kim (7 papers)
Rajasimman Madhivanan (5 papers)
Arnie Sen (12 papers)
Dinesh Manocha (366 papers)

Citations (6)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Tweets

https://twitter.com/CSVisionPapers/status/1788804161707864106

https://twitter.com/gastronomy/status/1788781796336738716

YouTube

Show All Videos

LOC-ZSON: Language-driven Object-Centric Zero-Shot Object Retrieval and Navigation (2405.05363v1)

Summary

Related Papers

Tweets

YouTube