- The paper finds that post-hoc explanations significantly boost user trust by allowing comparative evaluations of LLM responses.
- Methodologically, it uses RLHF fine-tuned models and varied explanatory styles, evaluated through human studies and ROUGE scoring.
- Results show that explanation framing enhances trust without degrading model performance, suggesting potential for improved human-AI collaboration.
Summary of "Why Would You Suggest That? Human Trust in LLM Responses" (2406.02018)
The paper "Why Would You Suggest That? Human Trust in LLM Responses" presents an in-depth investigation into the factors influencing human trust in responses generated by LLMs. Through human studies and model evaluations on tasks from the LaMP benchmark, particularly focusing on the News Headline Generation task, the authors examine how explanation types and their framing affect user trust and model performance.
Impact of Explanations on User Trust
The research articulates that the inclusion of explanations in LLM responses can significantly enhance user trust when users are able to compare various model-generated responses. However, when such comparisons are not possible, users tend to trust all model responses similarly, regardless of their truthfulness. This suggests a complex dynamics between explanation presence, its framing, and user trust, centering the role of explanation as a crucial element in human-LLM interaction.
Methodology
The paper utilizes outputs from state-of-the-art RLHF fine-tuned models, including GPT-3.5-Turbo and GPT-4, evaluating them on open-ended tasks like news headline generation. The experimental setup involves various explanatory styles - from no explanation, prefix, pre- and post-hoc justifications, to cross-domain and fabricated justifications. User studies were conducted to measure perceived competence, usefulness, and trust using a Likert scale and comparative ranking methods.
Results
The paper finds that explanation presence, especially post-hoc explanations, generally improves trust over explanations provided preemptively (pre-hoc) or cross-domain. Interestingly, fake justifications were detrimental to trust in comparative scenarios, but not noticeable when responses were presented in isolation. The performance of LLMs, measured using ROUGE scores, did not show significant trade-offs with the type of explanation provided, suggesting that explanations enhance user trust without degrading performance.
Implications for Future Research
The findings emphasize the importance of incorporating explanations as a standard feature in LLM outputs to foster trust in human-AI collaboration. Future work is encouraged to explore deeper the faithfulness of explanations and their cognitive impact on users. Additionally, expanding trust evaluation frameworks beyond negative impacts (e.g., bias, misinformation) to include benign and procedural aspects could offer a more rounded understanding of trust.
Conclusion
Ultimately, the paper provides compelling evidence that explanations matter in the trustworthiness of AI systems. In practical AI deployments, including explanations in model responses, specifically post-hoc ones, could significantly bolster human trust without adverse effects on performance, thereby enhancing the efficacy of human-AI collaborations. Future research should target improving the interpretability and faithfulness of AI explanations to align machine reasoning with human expectations and needs.