Does AI help humans make better decisions? A statistical evaluation framework for experimental and observational studies (2403.12108v3)

Published 18 Mar 2024 in cs.AI, econ.GN, q-fin.EC, stat.AP, and stat.ME

Abstract: The use of AI, or more generally data-driven algorithms, has become ubiquitous in today's society. Yet, in many cases and especially when stakes are high, humans still make final decisions. The critical question, therefore, is whether AI helps humans make better decisions compared to a human-alone or AI-alone system. We introduce a new methodological framework to empirically answer this question with a minimal set of assumptions. We measure a decision maker's ability to make correct decisions using standard classification metrics based on the baseline potential outcome. We consider a single-blinded and unconfounded treatment assignment, where the provision of AI-generated recommendations is assumed to be randomized across cases with humans making final decisions. Under this study design, we show how to compare the performance of three alternative decision-making systems--human-alone, human-with-AI, and AI-alone. Importantly, the AI-alone system includes any individualized treatment assignment, including those that are not used in the original study. We also show when AI recommendations should be provided to a human-decision maker, and when one should follow such recommendations. We apply the proposed methodology to our own randomized controlled trial evaluating a pretrial risk assessment instrument. We find that the risk assessment recommendations do not improve the classification accuracy of a judge's decision to impose cash bail. Furthermore, we find that replacing a human judge with algorithms--the risk assessment score and a LLM in particular--leads to a worse classification performance.

References (37)

Summary

The paper introduces a novel experimental framework that isolates the influence of AI recommendations on human decision-making.
It employs a randomized controlled trial in pretrial bail decisions, finding no significant improvement in judges' classification accuracy.
The study highlights that AI-alone decisions underperform and exhibit racial bias, underscoring the need for cautious AI implementation.

Evaluating the Impact of AI Recommendations on Human Decision-Making: Experimental Evidence from Pretrial Decision Decisions

Introduction to the Methodological Framework and Experimental Design

A novel methodological framework is introduced to experimentally evaluate whether AI-generated recommendations improve human decision-making compared to decisions made by humans alone or AI alone. This work navigates the challenging terrain of selective labels, where the outcomes of interest are inherently conditioned on the decisions made. Leveraging a single-blinded experimental design, this paper randomizes the provision of AI recommendations to human decision-makers, thus maintaining the integrity of the experimental setup and ensuring that the effects of AI recommendations are isolated through their influence on human decisions.

The Experimental Context and Findings

The paper is grounded in an empirical analysis involving a randomized controlled trial (RCT) assessing the impact of AI-generated predisposition risk assessment (called the PSA) on judges’ decisions regarding cash bail versus signature bond at a criminal first appearance hearing. The findings reveal a lack of significant improvement in the classification accuracy of judges' decisions when provided with AI recommendations. Moreover, decisions made solely by AI were generally found to underperform compared to those involving human judgment, either with or without AI input. Notably, a substantial disparity was identified in AI-alone decisions, where a higher false positive rate was observed for non-white arrestees in comparison to their white counterparts.

Implications of the Study

The outcomes of this research have both theoretical and practical significance. Theoretically, it highlights the intricate dynamics between human decision-makers and AI-based recommendations, challenging the assumption that AI integration naturally enhances decision accuracy. Practically, the findings signal to policymakers and practitioners the need for a cautious approach toward implementing AI in sensitive decision-making arenas like the judicial system. By revealing specific shortcomings in AI recommendations—particularly around racial disparities—the paper underscores the urgency for rigorous, context-specific evaluations before widespread deployment.

Future Directions in AI and Human Decision-Making Research

Looking forward, this paper lays a foundation for subsequent research paths that could explore various dimensions of AI-assisted decision-making. One potential avenue is extending the proposed methodological framework to non-binary decision-making settings, thereby expanding its applicability. Investigating the joint potential outcomes, rather than focusing solely on the baseline potential outcome, could also yield deeper insights into the nuanced impacts of AI on decision quality. Dynamic settings, where decisions and outcomes evolve over time, offer another rich context for future exploration. Lastly, the practical deployment of AI decision-making systems across different sectors presents an ongoing opportunity to refine and validate the framework introduced in this paper.

Conclusion

This research provides a methodologically robust, empirically grounded critique of the integration of AI recommendations into human decision-making processes, particularly within the judicial context. By systematically examining the influence of AI on human judgment through a carefully designed RCT, the paper offers valuable insights into the limitations and potential risks associated with AI assistance. It serves as a crucial reminder of the need for comprehensive evaluation and cautious implementation of AI technologies in decision-making processes that significantly affect human lives.

PDF Markdown

Related Papers

Tweets

https://twitter.com/melodyyhuang/status/1770452539558805822

https://twitter.com/jasonanastas/status/1773124568376352815

https://twitter.com/HarvardStats/status/1813211991215120479

https://twitter.com/CapivaraMarket/status/1770428898829840809