Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 60 tok/s
Gemini 2.5 Pro 51 tok/s Pro
GPT-5 Medium 18 tok/s Pro
GPT-5 High 14 tok/s Pro
GPT-4o 77 tok/s Pro
Kimi K2 159 tok/s Pro
GPT OSS 120B 456 tok/s Pro
Claude Sonnet 4 38 tok/s Pro
2000 character limit reached

Boosting Theory-of-Mind Performance in Large Language Models via Prompting (2304.11490v3)

Published 22 Apr 2023 in cs.AI and cs.CL

Abstract: LLMs excel in many tasks in 2023, but they still face challenges in complex reasoning. Theory-of-mind (ToM) tasks, which require understanding agents' beliefs, goals, and mental states, are essential for common-sense reasoning involving humans, making it crucial to enhance LLM performance in this area. This study measures the ToM performance of GPT-4 and three GPT-3.5 variants (Davinci-2, Davinci-3, GPT-3.5-Turbo), and investigates the effectiveness of in-context learning in improving their ToM comprehension. We evaluated prompts featuring two-shot chain of thought reasoning and step-by-step thinking instructions. We found that LLMs trained with Reinforcement Learning from Human Feedback (RLHF) (all models excluding Davinci-2) improved their ToM accuracy via in-context learning. GPT-4 performed best in zero-shot settings, reaching nearly 80% ToM accuracy, but still fell short of the 87% human accuracy on the test set. However, when supplied with prompts for in-context learning, all RLHF-trained LLMs exceeded 80% ToM accuracy, with GPT-4 reaching 100%. These results demonstrate that appropriate prompting enhances LLM ToM reasoning, and they underscore the context-dependent nature of LLM cognitive capacities.

Citations (63)

Summary

  • The paper demonstrates that prompt-based methodologies substantially enhance Theory-of-Mind task performance in large language models, achieving near-perfect accuracy with chain-of-thought techniques.
  • The study employs varied prompting strategies—zero-shot, step-by-step, few-shot, and chain-of-thought—to systematically compare performance across GPT-4 and GPT-3.5 variants.
  • The research highlights practical implications of using RLHF and context-sensitive prompts to elevate inferential reasoning abilities in models, offering insights for future AI advancements.

Enhancing Theory of Mind in LLMs through Prompting

The paper "Boosting Theory-of-Mind Performance in LLMs" presents an in-depth analysis of enhancing Theory of Mind (ToM) capabilities in LLMs, particularly focusing on models from the GPT family, such as GPT-4 and GPT-3.5 variants. ToM tasks are crucial in evaluating models' abilities to comprehend mental states, beliefs, and goals of agents, thereby involving a form of complex inference critical for natural language understanding.

Methodology and Experimental Setup

The authors investigated four main models: GPT-4, and the GPT-3.5 variants Davinci-2, Davinci-3, and GPT-3.5-Turbo. These models undergo varied training procedures, with most being fine-tuned via Reinforcement Learning from Human Feedback (RLHF), an integral component for enhancing performance in reasoning tasks. The paper employed standardized ToM and control scenarios, originally utilized in human studies, to assess and compare model capabilities.

Key approaches explored include zero-shot, step-by-step reasoning, few-shot, and chain-of-thought (CoT) reasoning prompting methods. This examination utilized carefully crafted prompt examples to measure performance shifts, focusing on task accuracy across scenarios and repetition to ensure result reliability.

Findings and Numerical Results

Initial findings in zero-shot conditions revealed that newer models typically performed better on control tasks but showed mixed results on ToM tasks. GPT-4 notably achieved about 80% accuracy, surpassing its predecessors. However, with CoT prompting, significant performance elevation was observed in most RLHF-trained models—GPT-4 attained perfect accuracy when prompted appropriately, and both GPT-3.5-Turbo and Davinci-3 exceeded human-level ToM performance when prompted with CoT and step-by-step reasoning.

The prompting techniques effectively improved LLMs' inferential reasoning abilities, indicating that these enhancements stem from invoking a mode of systematic reasoning rather than mere imitation of reasoning steps.

Implications and Future Directions

These results demonstrate the non-trivial role of prompting in unlocking LLM capabilities for complex tasks such as ToM reasoning. This can inform future work in AI, suggesting that the context of question framing and reasoning process instructions can significantly influence model performance. Furthermore, the paper emphasizes the context-sensitive nature of large models, reminding the research community of the latent potential that suitable prompting techniques may unveil.

This work aligns with ongoing discussions about AI reasoning capabilities, prompting further exploration into general inferential tasks beyond ToM. The results encourage interdisciplinary approaches to model training and refinement, leveraging human feedback and structured reasoning frameworks to enhance the reliability and performance of AI systems in socially-themed or context-dependent tasks.

Future research should explore more diverse task frameworks and different categories of inferential reasoning. Expanding the variety of prompts and testing conditions would provide deeper insights into the robustness and scalability of the presented prompting strategies in facilitating reasoning in LLMs.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Lightbulb On Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com