Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 45 tok/s
Gemini 2.5 Pro 54 tok/s Pro
GPT-5 Medium 22 tok/s Pro
GPT-5 High 20 tok/s Pro
GPT-4o 99 tok/s Pro
Kimi K2 183 tok/s Pro
GPT OSS 120B 467 tok/s Pro
Claude Sonnet 4 38 tok/s Pro
2000 character limit reached

Efficient Non-Parametric Uncertainty Quantification for Black-Box Large Language Models and Decision Planning (2402.00251v1)

Published 1 Feb 2024 in cs.LG, cs.AI, and cs.CL

Abstract: Step-by-step decision planning with LLMs is gaining attention in AI agent development. This paper focuses on decision planning with uncertainty estimation to address the hallucination problem in LLMs. Existing approaches are either white-box or computationally demanding, limiting use of black-box proprietary LLMs within budgets. The paper's first contribution is a non-parametric uncertainty quantification method for LLMs, efficiently estimating point-wise dependencies between input-decision on the fly with a single inference, without access to token logits. This estimator informs the statistical interpretation of decision trustworthiness. The second contribution outlines a systematic design for a decision-making agent, generating actions like turn on the bathroom light'' based on user prompts such astake a bath''. Users will be asked to provide preferences when more than one action has high estimated point-wise dependencies. In conclusion, our uncertainty estimation and decision-making agent design offer a cost-efficient approach for AI agent development.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (59)
  1. Do as i can, not as i say: Grounding language in robotic affordances. arXiv preprint arXiv:2204.01691, 2022.
  2. Rl4f: Generating natural language feedback with reinforcement learning for repairing model outputs. arXiv preprint arXiv:2305.08844, 2023.
  3. Llm in a flash: Efficient large language model inference with limited memory. arXiv preprint arXiv:2312.11514, 2023.
  4. Mutual information neural estimation. In International conference on machine learning, pp. 531–540. PMLR, 2018.
  5. Emergent autonomous scientific research capabilities of large language models. arXiv preprint arXiv:2304.05332, 2023.
  6. Rt-2: Vision-language-action models transfer web knowledge to robotic control. arXiv preprint arXiv:2307.15818, 2023.
  7. Decision transformer: Reinforcement learning via sequence modeling. Advances in neural information processing systems, 34:15084–15097, 2021.
  8. Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078, 2014.
  9. Ad-autogpt: An autonomous gpt for alzheimer’s disease infodemiology. arXiv preprint arXiv:2306.10095, 2023.
  10. Palm-e: An embodied multimodal language model. arXiv preprint arXiv:2303.03378, 2023.
  11. Unsupervised quality estimation for neural machine translation. Transactions of the Association for Computational Linguistics, 8:539–555, 2020.
  12. Retrieval augmented language model pre-training. In International conference on machine learning, pp. 3929–3938. PMLR, 2020.
  13. Metagpt: Meta programming for multi-agent collaborative framework. arXiv preprint arXiv:2308.00352, 2023.
  14. Inner monologue: Embodied reasoning through planning with language models. arXiv preprint arXiv:2207.05608, 2022.
  15. Few-shot learning with retrieval augmented language models. arXiv preprint arXiv:2208.03299, 2022.
  16. Mistral 7b. arXiv preprint arXiv:2310.06825, 2023.
  17. Mixtral of experts, 2024.
  18. Language models (mostly) know what they know. arXiv preprint arXiv:2207.05221, 2022.
  19. Semantic uncertainty: Linguistic invariances for uncertainty estimation in natural language generation. arXiv preprint arXiv:2302.09664, 2023.
  20. Conformal prediction with large language models for multi-choice question answering. arXiv preprint arXiv:2305.18404, 2023.
  21. A simple unified framework for detecting out-of-distribution samples and adversarial attacks. Advances in neural information processing systems, 31, 2018.
  22. Camel: Communicative agents for” mind” exploration of large scale language model society. arXiv preprint arXiv:2303.17760, 2023.
  23. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems, 35:31199–31212, 2022.
  24. Generating with confidence: Uncertainty quantification for black-box large language models. arXiv preprint arXiv:2305.19187, 2023.
  25. Languages are rewards: Hindsight finetuning using human feedback. arXiv preprint arXiv:2302.02676, 2023a.
  26. Training socially aligned language models in simulated human society. arXiv preprint arXiv:2305.16960, 2023b.
  27. Chameleon: Plug-and-play compositional reasoning with large language models. arXiv preprint arXiv:2304.09842, 2023.
  28. Uncertainty estimation in autoregressive structured prediction. arXiv preprint arXiv:2002.07650, 2020.
  29. Webgpt: Browser-assisted question-answering with human feedback. arXiv preprint arXiv:2112.09332, 2021.
  30. Gpt-4 technical report, 2023.
  31. Generative agents: Interactive simulacra of human behavior. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology, pp.  1–22, 2023.
  32. Check your facts and try again: Improving large language models with external knowledge and automated feedback. arXiv preprint arXiv:2302.12813, 2023.
  33. Communicative agents for software development. arXiv preprint arXiv:2307.07924, 2023.
  34. Conformal language modeling. arXiv preprint arXiv:2306.10193, 2023.
  35. Robots that ask for help: Uncertainty alignment for large language model planners. arXiv preprint arXiv:2307.01928, 2023.
  36. Out-of-distribution detection and selective generation for conditional language models. arXiv preprint arXiv:2209.15558, 2022.
  37. Artificial intelligence a modern approach. London, 2010.
  38. Toolformer: Language models can teach themselves to use tools. arXiv preprint arXiv:2302.04761, 2023.
  39. Languagempc: Large language models as decision makers for autonomous driving. arXiv preprint arXiv:2310.03026, 2023.
  40. A tutorial on conformal prediction. Journal of Machine Learning Research, 9(3), 2008.
  41. Smith, R. C. Uncertainty quantification: theory, implementation, and applications, volume 12. Siam, 2013.
  42. Cognitive architectures for language agents. arXiv preprint arXiv:2309.02427, 2023.
  43. Mobilebert: a compact task-agnostic bert for resource-limited devices. arXiv preprint arXiv:2004.02984, 2020.
  44. Relevant and informative response generation using pointwise mutual information. In Proceedings of the First Workshop on NLP for Conversational AI, pp.  133–138, 2019.
  45. Large language models in medicine. Nature medicine, 29(8):1930–1940, 2023.
  46. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023a.
  47. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023b.
  48. Neural methods for point-wise dependency estimation. Advances in Neural Information Processing Systems, 33:62–72, 2020.
  49. Self-supervised representation learning with relative predictive coding. arXiv preprint arXiv:2103.11275, 2021.
  50. Multimodal large language model for visual navigation. arXiv preprint arXiv:2310.08669, 2023.
  51. Mutual information alleviates hallucinations in abstractive summarization. arXiv preprint arXiv:2210.13210, 2022.
  52. Uncertainty estimation of transformer predictions for misclassification detection. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp.  8237–8252, 2022.
  53. Intelligent agents: Theory and practice. The knowledge engineering review, 10(2):115–152, 1995.
  54. Autogen: Enabling next-gen llm applications via multi-agent conversation framework. arXiv preprint arXiv:2308.08155, 2023.
  55. Auto-gpt for online decision making: Benchmarks and additional opinions. arXiv preprint arXiv:2306.02224, 2023.
  56. React: Synergizing reasoning and acting in language models. arXiv preprint arXiv:2210.03629, 2022.
  57. Detection of word adversarial examples in text classification: Benchmark and baseline via robust density estimation. arXiv preprint arXiv:2203.01677, 2022.
  58. Siren’s song in the ai ocean: A survey on hallucination in large language models. arXiv preprint arXiv:2309.01219, 2023.
  59. Navgpt: Explicit reasoning in vision-and-language navigation with large language models. arXiv preprint arXiv:2305.16986, 2023.
Citations (2)

Summary

  • The paper's main contribution is the introduction of an efficient non-parametric method to quantify uncertainty in LLM-based decision planning.
  • It employs point-wise dependency estimation and conformal prediction to robustly calibrate decision confidence using real-world data.
  • Results demonstrate improved F1 scores and mean precision, highlighting the method's efficacy in reducing hallucinations in AI agents.

Efficient Non-Parametric Uncertainty Quantification for Black-Box LLMs and Decision Planning

Introduction

The paper explores the development of AI agents using LLMs with a focus on uncertainty quantification to address the problem of hallucinations in LLMs. It introduces a non-parametric method that efficiently estimates point-wise dependencies between inputs and decisions, enhancing decision trustworthiness without accessing token logits. This approach is particularly suited for use with black-box proprietary LLMs, offering a cost-effective solution for AI agent applications.

Decision-Making Agent Design

The paper presents a design for a decision-making agent capable of actions based on user inputs, employing uncertainty quantification to guide decision-making. This involves:

  • Data Collection: Compiling a dataset of 20,000 user requests and corresponding actions (Figure 1). Figure 1

    Figure 1: A decision-making agent design. During the {\em data collection phase}, smart home actions are associated with user requests.

  • Model Training: Performing instruction fine-tuning on a robust LLM and training a point-wise dependency estimator to establish relationships among inputs and actions.
  • Deployment: Utilizing a statically guaranteed decision-making process via conformal prediction, ensuring a high probability of correct action generation (Figure 2). Figure 2

    Figure 2: Distributions of estimated point-wise dependency between user prompt, taken actions, and current action.

Uncertainty Quantification Approach

The proposed method utilizes point-wise dependency neural estimation to evaluate the correlation between user inputs and agent decisions. This non-parametric approach, using a neural network, efficiently estimates dependencies with a single inference, thereby reducing the computational load typically associated with black-box models.

Key components of the approach include:

  • Defining a threshold via conformal prediction on calibration data, utilizing past decisions to inform future actions (Figure 3). Figure 3

    Figure 3: Conformal prediction on calibration data with an identified threshold for action confidence.

  • Utilizing a stabilized density-ratio fitting method for training the dependency estimator, ensuring robust dependency estimation across user prompts and actions.

Evaluation

The evaluation focuses on comparing step-by-step decision planning with all-at-once generation strategies. Results indicate:

  • Step-by-step planning achieves superior F1 scores, leveraging historical actions for better decision accuracy.
  • A threshold on point-wise dependency significantly improves mean precision, reducing incorrect actions.
  • The proposed method meets statistical guarantees while optimizing performance, highlighting its efficacy in real-world applications.

Conclusion

The paper introduces an efficient method for uncertainty quantification in LLMs, enabling advanced decision-making capabilities in AI agents. By addressing scalability and integration challenges, this approach facilitates the deployment of sophisticated AI agents using proprietary LLMs. Future research can explore enhanced semantic similarity measures and integrate human studies for further validation.

Overall, the work provides a significant contribution to non-parametric methods in AI agent development, emphasizing practical applications in natural language interactions.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Youtube Logo Streamline Icon: https://streamlinehq.com