Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 37 tok/s
Gemini 2.5 Pro 44 tok/s Pro
GPT-5 Medium 14 tok/s Pro
GPT-5 High 14 tok/s Pro
GPT-4o 90 tok/s Pro
Kimi K2 179 tok/s Pro
GPT OSS 120B 462 tok/s Pro
Claude Sonnet 4 37 tok/s Pro
2000 character limit reached

DeLLMa: Decision Making Under Uncertainty with Large Language Models (2402.02392v3)

Published 4 Feb 2024 in cs.AI, cs.CL, and cs.LG

Abstract: The potential of LLMs as decision support tools is increasingly being explored in fields such as business, engineering, and medicine, which often face challenging tasks of decision-making under uncertainty. In this paper, we show that directly prompting LLMs on these types of decision-making problems can yield poor results, especially as the problem complexity increases. To aid in these tasks, we propose DeLLMa (Decision-making LLM assistant), a framework designed to enhance decision-making accuracy in uncertain environments. DeLLMa involves a multi-step reasoning procedure that integrates recent best practices in scaling inference-time reasoning, drawing upon principles from decision theory and utility theory, to provide an accurate and human-auditable decision-making process. We validate our procedure on multiple realistic decision-making environments, demonstrating that DeLLMa can consistently enhance the decision-making performance of leading LLMs, and achieve up to a 40% increase in accuracy over competing methods. Additionally, we show how performance improves when scaling compute at test time, and carry out human evaluations to benchmark components of DeLLMa.

Citations (6)
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

  • The paper presents DeLLMa, which integrates classical decision theory into LLM prompting to enhance decision making under uncertainty.
  • It employs a multi-phase workflow—state enumeration, forecasting, utility elicitation, and expected utility maximization—to produce transparent, audit-friendly outcomes.
  • Empirical evaluations demonstrate up to a 40% accuracy improvement over traditional prompting methods in high-stakes, uncertain decision environments.

DeLLMa: A Structured Approach for Decision Making Under Uncertainty with LLMs

The paper "DeLLMa: Decision Making Under Uncertainty with LLMs" (2402.02392) introduces DeLLMa, a principled framework designed to enable LLMs to make transparent and more optimal decisions under uncertainty by scaffolding LLM reasoning with classical decision-theoretic principles. This work addresses notable shortcomings of prevailing prompting strategies when applied to high-stakes decision tasks involving complex uncertainties, and proposes a multi-phase workflow to operationalize utility-maximizing decision support with LLMs.

Motivation and Problem Formulation

Empirical LLM adoption spans business, finance, agriculture, and numerous domains where practitioners must act under significant uncertainty. However, direct prompting approaches—such as zero-shot, self-consistency (SC), and chain-of-thought (CoT) strategies—routinely fail to negotiate uncertainty in a principled manner, often becoming overconfident, failing to forecast possible states, or disregarding user-specific objectives.

DeLLMa's design is motivated by:

  • The inherent irrationality and bias in human and LLM decision making when formal probabilistic reasoning frameworks (e.g., expected utility theory) are absent.
  • The need for human auditability: transparent, step-wise reasoning and justifications underpinning each suggested action, critical for trust and practical deployment.

The decision problem is formalized as selection from a discrete action set A\mathcal{A}, facing latent states Θ\Theta with unknown realizations, and optimizing a user-specific utility U:Θ×ARU: \Theta \times \mathcal{A} \to \mathbb{R}.

DeLLMa Framework

The DeLLMa pipeline is organized into the following four phases, corresponding to classical normative models of rational choice:

  1. State Enumeration: The LLM, given a prompt comprising a user goal G\mathcal{G}, action space A\mathcal{A}, and contextual data C\mathcal{C}, generates latent factors believed to drive outcomes (e.g., climate variables or economic indicators). Each latent factor is discretized into plausible values, yielding a combinatorial state space.
  2. State Forecasting: The LLM is tasked with assigning verbal probability scores (mapped to numeric values) to each latent variable value, based on the context, assuming factor independence. This process induces a joint (though factored) probability distribution over states, from which samples can be efficiently drawn.
  3. Utility Function Elicitation: To estimate user preferences, the LLM is prompted to rank batches of state-action pairs, sampled from the forecasted state distribution and actions. Rankings are translated into pairwise comparisons, which parameterize a classical Bradley-Terry model for estimating utilities. Overlapping minibatches and variance-reduction strategies ensure more robust elicitation despite LLMs' variable quantitative accuracy.
  4. Expected Utility Maximization: The elicited utility surface, together with the state belief distribution, supports estimation of each action's expected utility via Monte Carlo sampling. The action maximizing expected utility is returned, along with a human-readable audit of the reasoning chain.

Empirical Evaluation

Experiments with real-world datasets from agriculture (USDA reports on fruit production/market data) and finance (stock selection using historical price tables) demonstrate strong numerical results. Key findings include:

  • Accuracy Improvement: Across both environments, DeLLMa variants consistently outperform zero-shot, SC, and CoT baselines. In the agriculture decision set, DeLLMa achieves up to a 40% increase in accuracy over alternatives as the action space grows.
  • Utility Normalization: The utility of chosen actions, normalized against the offline ground-truth optimum, is significantly higher for DeLLMa variants.
  • Failure Modes of Baselines: Standard prompting baselines underperform random choice in large action spaces, often parroting contextual sentiment or failing to handle counterfactual scenarios. DeLLMa's explicit enumeration of uncertainty and systematic counterfactual reasoning overcomes these weaknesses.
  • Variance Reduction and Batching: The advanced DeLLMa-Pairs and DeLLMa-Top1 strategies, leveraging variance reduction, provide tangible gains over naive approaches; however, in high-volatility environments (stocks) overly aggressive pairwise ranking can introduce noise, sometimes making Top1 preferable.

Implementation Details and Practical Considerations

DeLLMa is implemented using current LLM APIs (e.g., GPT-4). Each phase is realized via explicit, modular prompts. The approach is highly auditable: intermediate outputs—state lists, probability assignments, and utility rankings—can be inspected individually. This enables human oversight and facilitates integration into decision support systems where regulatory and practical interpretability constraints are strict.

Computational Considerations:

  • The framework scales linearly with action set size, number of latent factors, and batching overlap. API usage is likewise linear: large prompt lengths and multiple calls per decision instance are required, especially in utility elicitation.
  • In environments with extremely large state/action spaces, prompt engineering and retrieval-augmented generation should be considered to manage tractability and cost.

Limitations:

  • The independence assumption among latent factors simplifies inference but may fail in tightly coupled domains.
  • Utility elicitation relies on the LLM's ability to produce consistent and accurate rankings—still a challenge for quantitatively complex or highly ambiguous tasks.
  • Residual LLM hallucinations and overconfidence remain sources of error, especially in volatile, data-sparse environments.

Implications and Future Directions

Practical Implications:

  • DeLLMa offers a template for deploying LLMs as decision support agents where uncertainty and user-defined utilities are central—relevant in domains such as medical treatment selection, supply chain optimization, policy evaluation, and high-stakes investment strategies.
  • The transparency and modularity of the workflow support rigorous audits, regulatory compliance, and trustworthy human-in-the-loop deployments.

Theoretical Implications:

  • The work provides concrete evidence that classical decision-theoretic formalisms scaffold LLM reasoning in settings where standard prompting methods collapse, suggesting a synergy between symbolic and neural paradigms for structured reasoning.
  • It motivates further inquiry into integrating statistical inference (e.g., posterior computation) and utility learning with generative LLMs.

Prospective Developments:

  • Extensions to continuous action spaces, richer state correlations, and more general utility representations (e.g., learned via reinforcement learning or explicit preference querying) are natural progressions.
  • Adoption in portfolios of actions (e.g., multi-asset financial decisions, combinatorial optimization in logistics) and dynamic, multi-step sequential decision processes is a compelling trajectory.
  • Systematic calibration of LLM uncertainty quantification and deeper integration with external data and analytical tools can further solidify decision quality and robustness.

In summary, DeLLMa charts a structured, interpretable path for leveraging LLMs in decision support under uncertainty, revealing marked advantages over unstructured prompting and establishing a foundation for further practical and theoretical advances in language-based decision automation.