Topological Planning with Transformers for Vision-and-Language Navigation (2012.05292v1)

Published 9 Dec 2020 in cs.RO, cs.AI, cs.CL, and cs.CV

Abstract: Conventional approaches to vision-and-language navigation (VLN) are trained end-to-end but struggle to perform well in freely traversable environments. Inspired by the robotics community, we propose a modular approach to VLN using topological maps. Given a natural language instruction and topological map, our approach leverages attention mechanisms to predict a navigation plan in the map. The plan is then executed with low-level actions (e.g. forward, rotate) using a robust controller. Experiments show that our method outperforms previous end-to-end approaches, generates interpretable navigation plans, and exhibits intelligent behaviors such as backtracking.

Citations (91)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Topological Planning with Transformers for Vision-and-Language Navigation (2012.05292v1)

Summary

Related Papers