Emergent Mind

Abstract

As LLMs are nondeterministic, the same input can generate different outputs, some of which may be incorrect or hallucinated. If run again, the LLM may correct itself and produce the correct answer. Unfortunately, most LLM-powered systems resort to single results which, correct or not, users accept. Having the LLM produce multiple outputs may help identify disagreements or alternatives. However, it is not obvious how the user will interpret conflicts or inconsistencies. To this end, we investigate how users perceive the AI model and comprehend the generated information when they receive multiple, potentially inconsistent, outputs. Through a preliminary study, we identified five types of output inconsistencies. Based on these categories, we conducted a study (N=252) in which participants were given one or more LLM-generated passages to an information-seeking question. We found that inconsistency within multiple LLM-generated outputs lowered the participants' perceived AI capacity, while also increasing their comprehension of the given information. Specifically, we observed that this positive effect of inconsistencies was most significant for participants who read two passages, compared to those who read three. Based on these findings, we present design implications that, instead of regarding LLM output inconsistencies as a drawback, we can reveal the potential inconsistencies to transparently indicate the limitations of these models and promote critical LLM usage.

Mean values and 95% confidence intervals for perceived AI capacity and comprehension across experimental conditions.

Overview

  • Researchers analyzed how variations in AI-generated responses influence users' trust in AI and their comprehension of the presented information.

  • Participants were exposed to either one, two, or three AI-generated texts with different consistency levels and observed for changes in trust (perceived AI capacity) and understanding (comprehension).

  • Findings suggest that two conflicting AI-generated responses can improve user understanding by encouraging deeper engagement, while three responses may increase reliance on majority but incorrect opinions.

Understanding AI Output Variance: Insights from Multiple Responses

The Impact of Multiple AI Outputs

Imagine you're using a language model like ChatGPT to answer a complex question. Typically, you'd get one response and take it at face value. But what if you received multiple, potentially conflicting answers? Does this make you trust the AI less or dive deeper into the topic to understand better? Researchers have explored these questions by examining how different numbers of AI-generated responses and their consistency influence user perception of AI reliability and their understanding of the information presented.

Study Summary

Participants were divided into groups where they either saw one, two, or three AI-generated passages in response to an information-seeking question. Each group experienced varying degrees of consistency between the passages. The study aimed to observe changes in participants' trust in the AI (perceived AI capacity) and their ability to understand the information provided (comprehension).

Key Findings on Perceived AI Capacity and Comprehension

  • Perceived AI Capacity: Inconsistencies between the passages generally decreased participants' trust in the AI. Interestingly, when given three passages, participants tended to rely on the majority answer, even if it was incorrect, suggesting that more information isn't always better for perceived accuracy.
  • Comprehension: Participants who received two slightly conflicting passages tended to understand the material better compared to those who received either one or three passages. This suggests that a moderate level of conflict may encourage deeper engagement with the content without overwhelming the reader.

Surprising Insights

The two-passage setup not only minimized blind trust in AI-generated content but also encouraged a more thorough evaluation of the information. However, the study revealed that too much data (as in the three-passage scenario) can lead to confusion or reliance on potentially misleading majority opinions.

Implications for AI Design and Interaction

The findings suggest several design strategies for AI and machine learning systems:

  1. Presenting Multiple Perspectives: Offering two varying responses could foster a more critical assessment and engagement with AI-generated content.
  2. Transparency: Clearly indicating when responses are generated from AI and explaining why discrepancies may occur can help manage expectations and encourage a more analytical approach to AI interactions.
  3. Cognitive Load Management: Care must be taken not to overwhelm users with too much information, which could reduce the effectiveness of the AI interaction.

Future Research Directions

The study prompts several questions for future research:

  • Beyond Text-Based Responses: Would these findings hold true for other forms of AI-generated content, such as images or videos?
  • Long-Term Interaction Effects: How does repeated exposure to consistent vs. inconsistent AI responses affect user trust and comprehension over time?
  • Impact of Initial Expectations: How does a user's prior belief about an AI's accuracy affect their response to consistency or lack thereof in AI outputs?

Understanding these dynamics can further refine how we design interactive AI systems that are both helpful and trustworthy, enhancing the human-AI interaction experience. Additionally, as AI continues to integrate into various aspects of daily life, adapting these findings to different contexts and user needs will be crucial in developing versatile, reliable AI tools.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.