Papers

Topics

Authors

Recent

View all

Gemini 2.5 Flash

97 tokens/sec

GPT-4o

53 tokens/sec

Gemini 2.5 Pro Pro

43 tokens/sec

o3 Pro

4 tokens/sec

GPT-4.1 Pro

47 tokens/sec

DeepSeek R1 via Azure Pro

28 tokens/sec

2000 character limit reached

25.1k

Towards Optimizing Human-Centric Objectives in AI-Assisted Decision-Making With Offline Reinforcement Learning (2403.05911v2)

Published 9 Mar 2024 in cs.HC and cs.AI

Abstract: Imagine if AI decision-support tools not only complemented our ability to make accurate decisions, but also improved our skills, boosted collaboration, and elevated the joy we derive from our tasks. Despite the potential to optimize a broad spectrum of such human-centric objectives, the design of current AI tools remains focused on decision accuracy alone. We propose offline reinforcement learning (RL) as a general approach for modeling human-AI decision-making to optimize human-AI interaction for diverse objectives. RL can optimize such objectives by tailoring decision support, providing the right type of assistance to the right person at the right time. We instantiated our approach with two objectives: human-AI accuracy on the decision-making task and human learning about the task and learned decision support policies from previous human-AI interaction data. We compared the optimized policies against several baselines in AI-assisted decision-making. Across two experiments (N=316 and N=964), our results demonstrated that people interacting with policies optimized for accuracy achieve significantly better accuracy -- and even human-AI complementarity -- compared to those interacting with any other type of AI support. Our results further indicated that human learning was more difficult to optimize than accuracy, with participants who interacted with learning-optimized policies showing significant learning improvement only at times. Our research (1) demonstrates offline RL to be a promising approach to model human-AI decision-making, leading to policies that may optimize human-centric objectives and provide novel insights about the AI-assisted decision-making space, and (2) emphasizes the importance of considering human-centric objectives beyond decision accuracy in AI-assisted decision-making, opening up the novel research challenge of optimizing human-AI interaction for such objectives.

References (64)

Authors (5)

Zana Buçinca (9 papers)
Siddharth Swaroop (17 papers)
Amanda E. Paluch (2 papers)
Susan A. Murphy (35 papers)
Krzysztof Z. Gajos (15 papers)

Citations (6)

View on Semantic Scholar

Summary

The paper introduces an offline RL framework that customizes AI support by balancing immediate accuracy and long-term human learning.
It employs a Markov Decision Process with offline Q-learning and varied assistance modes to adapt to individual cognitive traits and contextual factors.
The study demonstrates that adaptive RL policies can enhance decision accuracy and user engagement without diminishing subjective task enjoyment.

Towards Optimizing Human-Centric Objectives in AI-Assisted Decision-Making With Offline Reinforcement Learning

The paper "Towards Optimizing Human-Centric Objectives in AI-Assisted Decision-Making With Offline Reinforcement Learning" addresses a significant gap in the current design paradigm of AI decision support systems by emphasizing the need to optimize human-centric objectives beyond mere decision accuracy. This work introduces offline reinforcement learning (RL) as a viable, customizable approach to model human-AI decision-making processes in a way that can adapt to various human-centric objectives and contextual factors.

Methodological Approach

The authors adopted a structured approach to instantiate their proposed method. Targeting both immediate decision accuracy and human learning as the critical objectives, they employed a Markov Decision Process (MDP) framework. The state space was defined to encompass individual differences in need for cognition (NFC), along with relevant contextual factors such as the AI's uncertainty and the decision-maker's task knowledge. The action space included various forms of AI assistance: no assistance, explanation only, recommendation and explanation (SXAI), and on-demand assistance. The reward structure differentiated between immediate accuracy (dense reward) and learning (sparse reward).

The experimental methodology included a data collection paper leveraging an exploratory policy and subsequent offline learning of optimal policies through Q-learning. This choice enabled deriving decision-support policies without real-time interaction risks, particularly beneficial for sensitive applications like clinical settings.

Key Findings

Computational Insights

The computational evaluation demonstrated that the RL-based policies for optimizing accuracy and learning differed significantly from the fixed SXAI policy. Optimal policies for learning, specifically, favored interactions known to induce cognitive engagement, notably for individuals low in NFC. This insight aligns with the hypothesis that people less inclined toward cognitive effort can benefit from specially designed interventions that promote engagement.

The effectiveness of the RL policies was validated through two user studies, reinforcing the notion that adaptive, context-aware AI support can lead to superior outcomes compared to static assistance models. Notably, individuals interacting with the accuracy-optimized policy achieved significantly better decision accuracy than those using baseline policies, confirming the strength of the RL approach for this objective. Additionally, policies optimized for learning showed mixed results, indicating that while the approach holds promise, the complexity of designing interactions to foster learning requires further investigation.

Objective vs. Subjective Experience

Interestingly, the paper found no inherent trade-off between learning and subjective task enjoyment, particularly for individuals low in NFC, where cognitive engagement positively correlated with task enjoyment and perceived learning. This challenges previous assumptions that enhanced cognitive engagement might reduce subjective satisfaction, highlighting that well-designed interaction models can simultaneously enhance user experience and achieve pedagogical objectives.

Implications and Future Directions

The paper's contributions provide a robust foundation for future research aimed at refining AI decision support systems to better serve human-centric goals. The use of offline RL to model human-AI decision dynamics introduces a powerful toolkit for developing interaction policies that could adaptively enhance both operational performance and user satisfaction.

The findings underscore the necessity of extending the research to explore other human-centric objectives beyond accuracy and learning, such as promoting long-term user engagement or improving collaborative efficiency in team settings. Furthermore, there is a clear need for developing and empirically validating new forms of AI explanations and interactions that can reliably enhance learning across diverse user populations.

Finally, while this research focused on a non-critical domain (exercise prescription for laypeople), the methodology and results have broader applicability. Extending this work to high-stakes environments, such as healthcare, can offer substantial benefits. Embedding RL-based adaptive support into clinical decision support systems holds the potential to significantly improve patient outcomes by optimizing both decision accuracy and clinicians' learning, ultimately fostering a more effective and expert workforce.

Conclusion

This paper makes a substantial contribution to the field of AI-assisted decision-making by showcasing the potential of offline RL to achieve human-centric objectives. The findings advocate for a nuanced, dynamic approach to AI support that considers individual differences and specific context factors, paving the way for more effective and engaging human-AI collaborations. As AI continues to integrate into various aspects of decision-making, this research provides valuable insights and a practical framework for optimizing human-centric outcomes.

PDF Markdown

Tweets

https://twitter.com/ZanaBucinca/status/1780028340709040597