Papers

Topics

Authors

Recent

View all

Detailed Answer

Quick Answer

Concise responses based on abstracts only

Detailed Answer

Well-researched responses based on abstracts and relevant paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses

Gemini 2.5 Flash

Gemini 2.5 Flash 47 tok/s

Gemini 2.5 Pro 44 tok/s Pro

GPT-5 Medium 13 tok/s Pro

GPT-5 High 12 tok/s Pro

GPT-4o 64 tok/s Pro

Kimi K2 160 tok/s Pro

GPT OSS 120B 452 tok/s Pro

Claude Sonnet 4 37 tok/s Pro

2000 character limit reached

Ask Again, Then Fail: Large Language Models' Vacillations in Judgment (2310.02174v5)

Published 3 Oct 2023 in cs.CL, cs.AI, and cs.LG

Abstract: We observe that current conversational LLMs often waver in their judgments when faced with follow-up questions, even if the original judgment was correct. This wavering presents a significant challenge for generating reliable responses and building user trust. To comprehensively assess this issue, we introduce a \textsc{Follow-up Questioning Mechanism} along with two metrics to quantify this inconsistency, confirming its widespread presence in current LLMs. To mitigate this issue, we explore various prompting strategies for closed-source models; moreover, we develop a training-based framework \textsc{Unwavering-FQ} that teaches LLMs to maintain their originally correct judgments through synthesized high-quality preference data. Our experimental results confirm the effectiveness of our framework and its ability to enhance the general capabilities of models.

References (59)