LBAP: Improved Uncertainty Alignment of LLM Planners using Bayesian Inference (2403.13198v3)
Abstract: LLMs showcase many desirable traits for intelligent and helpful robots. However, they are also known to hallucinate predictions. This issue is exacerbated in robotics where LLM hallucinations may result in robots confidently executing plans that are contrary to user goals or relying more frequently on human assistance. In this work, we present LBAP, a novel approach for utilizing off-the-shelf LLMs, alongside Bayesian inference for uncertainty Alignment in robotic Planners that minimizes hallucinations and human intervention. Our key finding is that we can use Bayesian inference to more accurately calibrate a robots confidence measure through accounting for both scene grounding and world knowledge. This process allows us to mitigate hallucinations and better align the LLM's confidence measure with the probability of success. Through experiments in both simulation and the real world on tasks with a variety of ambiguities, we show that LBAP significantly increases success rate and decreases the amount of human intervention required relative to prior art. For example, in our real-world testing paradigm, LBAP decreases the human help rate of previous methods by over 33% at a success rate of 70%.
- B. Ichter, A. Brohan et al., “Do As I Can, Not As I Say: Grounding Language in Robotic Affordances,” in Proceedings of The 6th Conference on Robot Learning. PMLR, Mar. 2023, pp. 287–318.
- V. S. Dorbala, J. F. M. Jr, and D. Manocha, “Can an Embodied Agent Find Your “Cat-shaped Mug”? LLM-Based Zero-Shot Object Navigation,” IEEE Robotics and Automation Letters, pp. 1–8, 2023.
- A. Z. Ren, A. Dixit et al., “Robots That Ask For Help: Uncertainty Alignment for Large Language Model Planners,” Sep. 2023.
- T. Guan, F. Liu et al., “HallusionBench: An Advanced Diagnostic Suite for Entangled Language Hallucination & Visual Illusion in Large Vision-Language Models,” Nov. 2023.
- S. Dhuliawala, M. Komeili et al., “Chain-of-Verification Reduces Hallucination in Large Language Models,” Sep. 2023.
- W. Huang, F. Xia et al., “Inner Monologue: Embodied Reasoning through Planning with Language Models,” Jul. 2022.
- X. Gao, Q. Gao et al., “Dialfred: Dialogue-enabled agents for embodied instruction following,” IEEE Robotics and Automation Letters, vol. 7, no. 4, pp. 10 049–10 056, 2022.
- J. Thomason, M. Murray et al., “Vision-and-dialog navigation,” in Conference on Robot Learning. PMLR, 2020, pp. 394–406.
- K. Zhou, D. Jurafsky, and T. Hashimoto, “Navigating the Grey Area: How Expressions of Uncertainty and Overconfidence Affect Language Models,” in EMNLP, 2023, pp. 5506–5524.
- S. Lin, J. Hilton, and O. Evans, “Teaching Models to Express Their Uncertainty in Words,” Jun. 2022.
- Y. Xiao and W. Y. Wang, “Quantifying Uncertainties in Natural Language Processing Tasks,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, no. 01, pp. 7322–7329, Jul. 2019.
- B. Kumar, C. Lu et al., “Conformal Prediction with Large Language Models for Multi-Choice Question Answering,” Jul. 2023.
- T.-T. Do, A. Nguyen, and I. Reid, “Affordancenet: An end-to-end deep learning approach for object affordance detection,” in ICRA, 2018.
- M. Hassan, P. Ghosh et al., “Populating 3D Scenes by Learning Human-Scene Interaction,” in 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2021.
- X. Li, S. Liu et al., “Putting humans in a scene: Learning affordance in 3d indoor environments,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2019.
- J. F. Mullen and D. Manocha, “PACE: Data-Driven Virtual Agent Interaction in Dense and Cluttered Environments,” IEEE Transactions on Visualization and Computer Graphics, May 2023.
- J. F. Mullen, D. Kothandaraman et al., “Placing Human Animations Into 3D Scenes by Learning Interaction- and Geometry-Driven Keyframes,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023, pp. 300–310.
- N. Yamanobe, W. Wan et al., “A brief review of affordance in robotic manipulation research,” Advanced Robotics, vol. 31, no. 19-20, pp. 1086–1101, Oct. 2017.
- S. LaValle, “Planning algorithms,” Cambridge University Press google schola, vol. 2, pp. 3671–3678, 2006.
- J. Wei, X. Wang et al., “Chain-of-Thought Prompting Elicits Reasoning in Large Language Models,” Jan. 2023.
- T. Kojima, S. S. Gu et al., “Large Language Models are Zero-Shot Reasoners,” Jan. 2023.
- A. Creswell, M. Shanahan, and I. Higgins, “Selection-Inference: Exploiting Large Language Models for Interpretable Logical Reasoning,” in The Eleventh International Conference on Learning Representations, Sep. 2022.
- A. Lewkowycz, A. Andreassen et al., “Solving Quantitative Reasoning Problems with Language Models,” Jun. 2022.
- W. Huang, P. Abbeel et al., “Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents,” in Proceedings of the 39th International Conference on Machine Learning. PMLR, Jun. 2022, pp. 9118–9147.
- J. Wu, R. Antonova et al., “TidyBot: Personalized robot assistance with large language models,” Autonomous Robots, vol. 47, no. 8, pp. 1087–1102, Nov. 2023.
- B. Liu, Y. Jiang et al., “LLM+P: Empowering Large Language Models with Optimal Planning Proficiency,” Sep. 2023.
- S. Y. Gadre, M. Wortsman et al., “CoWs on Pasture: Baselines and Benchmarks for Language-Driven Zero-Shot Object Navigation,” Dec. 2022.
- A. Padmakumar, J. Thomason et al., “TEACh: Task-Driven Embodied Agents That Chat,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 2, pp. 2017–2025, Jun. 2022.
- J. S. Park, B. Jia et al., “Efficient generation of motion plans from attribute-based natural language instructions using dynamic constraint mapping,” in ICRA. IEEE, 2019.
- V. S. Dorbala, P. Goyal et al., “S-eqa: Tackling subjective queries in embodied question answering,” https://gamma.umd.edu/researchdirections/embodied/seqa/, [Accessed 26-02-2024].
- K. Murray and D. Chiang, “Correcting Length Bias in Neural Machine Translation,” in Proceedings of the Third Conference on Machine Translation: Research Papers, Oct. 2018, pp. 212–223.
- A. Radford, J. W. Kim et al., “Learning transferable visual models from natural language supervision,” in International conference on machine learning. PMLR, 2021, pp. 8748–8763.
- X. Gu, T.-Y. Lin et al., “Open-vocabulary Object Detection via Vision and Language Knowledge Distillation,” May 2022.
- T. Brown, B. Mann et al., “Language models are few-shot learners,” Advances in neural information processing systems, 2020.
- OpenAI, J. Achiam et al., “Gpt-4 technical report,” 2024.