Papers
Topics
Authors
Recent
2000 character limit reached

Why Would You Suggest That? Human Trust in Language Model Responses (2406.02018v2)

Published 4 Jun 2024 in cs.CL, cs.AI, and cs.HC

Abstract: The emergence of LLMs has revealed a growing need for human-AI collaboration, especially in creative decision-making scenarios where trust and reliance are paramount. Through human studies and model evaluations on the open-ended News Headline Generation task from the LaMP benchmark, we analyze how the framing and presence of explanations affect user trust and model performance. Overall, we provide evidence that adding an explanation in the model response to justify its reasoning significantly increases self-reported user trust in the model when the user has the opportunity to compare various responses. Position and faithfulness of these explanations are also important factors. However, these gains disappear when users are shown responses independently, suggesting that humans trust all model responses, including deceptive ones, equitably when they are shown in isolation. Our findings urge future research to delve deeper into the nuanced evaluation of trust in human-machine teaming systems.

Citations (2)

Summary

  • The paper finds that post-hoc explanations significantly boost user trust by allowing comparative evaluations of LLM responses.
  • Methodologically, it uses RLHF fine-tuned models and varied explanatory styles, evaluated through human studies and ROUGE scoring.
  • Results show that explanation framing enhances trust without degrading model performance, suggesting potential for improved human-AI collaboration.

Summary of "Why Would You Suggest That? Human Trust in LLM Responses" (2406.02018)

The paper "Why Would You Suggest That? Human Trust in LLM Responses" presents an in-depth investigation into the factors influencing human trust in responses generated by LLMs. Through human studies and model evaluations on tasks from the LaMP benchmark, particularly focusing on the News Headline Generation task, the authors examine how explanation types and their framing affect user trust and model performance.

Impact of Explanations on User Trust

The research articulates that the inclusion of explanations in LLM responses can significantly enhance user trust when users are able to compare various model-generated responses. However, when such comparisons are not possible, users tend to trust all model responses similarly, regardless of their truthfulness. This suggests a complex dynamics between explanation presence, its framing, and user trust, centering the role of explanation as a crucial element in human-LLM interaction.

Methodology

The paper utilizes outputs from state-of-the-art RLHF fine-tuned models, including GPT-3.5-Turbo and GPT-4, evaluating them on open-ended tasks like news headline generation. The experimental setup involves various explanatory styles - from no explanation, prefix, pre- and post-hoc justifications, to cross-domain and fabricated justifications. User studies were conducted to measure perceived competence, usefulness, and trust using a Likert scale and comparative ranking methods.

Results

The paper finds that explanation presence, especially post-hoc explanations, generally improves trust over explanations provided preemptively (pre-hoc) or cross-domain. Interestingly, fake justifications were detrimental to trust in comparative scenarios, but not noticeable when responses were presented in isolation. The performance of LLMs, measured using ROUGE scores, did not show significant trade-offs with the type of explanation provided, suggesting that explanations enhance user trust without degrading performance.

Implications for Future Research

The findings emphasize the importance of incorporating explanations as a standard feature in LLM outputs to foster trust in human-AI collaboration. Future work is encouraged to explore deeper the faithfulness of explanations and their cognitive impact on users. Additionally, expanding trust evaluation frameworks beyond negative impacts (e.g., bias, misinformation) to include benign and procedural aspects could offer a more rounded understanding of trust.

Conclusion

Ultimately, the paper provides compelling evidence that explanations matter in the trustworthiness of AI systems. In practical AI deployments, including explanations in model responses, specifically post-hoc ones, could significantly bolster human trust without adverse effects on performance, thereby enhancing the efficacy of human-AI collaborations. Future research should target improving the interpretability and faithfulness of AI explanations to align machine reasoning with human expectations and needs.

Whiteboard

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 2 tweets with 1 like about this paper.