Evaluating Spatial Understanding of Large Language Models (2310.14540v3)
Abstract: LLMs show remarkable capabilities across a variety of tasks. Despite the models only seeing text in training, several recent studies suggest that LLM representations implicitly capture aspects of the underlying grounded concepts. Here, we explore LLM representations of a particularly salient kind of grounded knowledge -- spatial relationships. We design natural-language navigation tasks and evaluate the ability of LLMs, in particular GPT-3.5-turbo, GPT-4, and Llama2 series models, to represent and reason about spatial structures. These tasks reveal substantial variability in LLM performance across different spatial structures, including square, hexagonal, and triangular grids, rings, and trees. In extensive error analysis, we find that LLMs' mistakes reflect both spatial and non-spatial factors. These findings suggest that LLMs appear to capture certain aspects of spatial structure implicitly, but room for improvement remains.
- Can Language Models Encode Perceptual Structure Without Grounding? A Case Study in Color. In Proceedings of the 25th Conference on Computational Natural Language Learning, pp. 109–132, Online, November 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.conll-1.9.
- BIG-bench collaboration. Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models. Transactions on Machine Learning Research, May 2023. ISSN 2835-8856.
- Sparks of Artificial General Intelligence: Early experiments with GPT-4, April 2023.
- Organizing conceptual knowledge in humans with a gridlike code. Science, 352(6292):1464–1468, June 2016. doi: 10.1126/science.aaf0941.
- Entropy of city street networks linked to future spatial navigation ability. Nature, 604(7904):104–110, April 2022. ISSN 1476-4687. doi: 10.1038/s41586-022-04486-7.
- A map of abstract relational knowledge in the human hippocampal–entorhinal cortex. eLife, 6:e17086, April 2017. ISSN 2050-084X. doi: 10.7554/eLife.17086.
- Microstructure of a spatial map in the entorhinal cortex. Nature, 436(7052):801–806, August 2005. ISSN 1476-4687. doi: 10.1038/nature03721.
- Inner Monologue: Embodied Reasoning through Planning with Language Models. In Conference on Robot Learning, 2022. doi: 10.48550/ARXIV.2207.05608.
- Boundary Vector Cells in the Subiculum of the Hippocampal Formation. Journal of Neuroscience, 29(31):9771–9777, August 2009. ISSN 0270-6474, 1529-2401. doi: 10.1523/JNEUROSCI.1319-09.2009.
- Implicit Representations of Meaning in Neural Language Models. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 1813–1827, Online, August 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.acl-long.143.
- Code as Policies: Language Model Programs for Embodied Control. 2023 IEEE International Conference on Robotics and Automation (ICRA), pp. 9493–9500, May 2023. doi: 10.1109/ICRA48891.2023.10160591.
- SPARTQA: A Textual Question Answering Benchmark for Spatial Reasoning. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 4582–4598, Online, June 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.naacl-main.364.
- Evaluating Cognitive Maps and Planning in Large Language Models with CogEval, September 2023.
- J. O’Keefe and J. Dostrovsky. The hippocampus as a spatial map. Preliminary evidence from unit activity in the freely-moving rat. Brain Research, 34(1):171–175, November 1971. ISSN 0006-8993. doi: 10.1016/0006-8993(71)90358-1.
- Mapping Language Models to Grounded Conceptual Spaces. In International Conference on Learning Representations, January 2022.
- SayNav: Grounding Large Language Models for Dynamic Planning to Navigation in New Environments, September 2023.
- StepGame: A New Benchmark for Robust Multi-Hop Spatial Reasoning in Texts. Proceedings of the AAAI Conference on Artificial Intelligence, 36(10):11321–11329, June 2022. ISSN 2374-3468. doi: 10.1609/aaai.v36i10.21383.
- ProgPrompt: Program generation for situated robot task planning using large language models. Autonomous Robots, 47(8):999–1012, December 2023. ISSN 0929-5593, 1573-7527. doi: 10.1007/s10514-023-10135-3.
- Edward C. Tolman. Cognitive maps in rats and men. Psychological Review, 55:189–208, 1948. ISSN 1939-1471. doi: 10.1037/h0061626.
- Llama 2: Open Foundation and Fine-Tuned Chat Models, July 2023.
- The Tolman-Eichenbaum Machine: Unifying Space and Relational Memory through Generalization in the Hippocampal Formation. Cell, 183(5):1249–1263.e23, November 2020. ISSN 0092-8674. doi: 10.1016/j.cell.2020.10.024.
- How to build a cognitive map. Nature Neuroscience, 25(10):1257–1272, October 2022. ISSN 1546-1726. doi: 10.1038/s41593-022-01153-y.