Inducing anxiety in large language models can induce bias (2304.11111v2)

Published 21 Apr 2023 in cs.CL, cs.AI, and cs.LG

Abstract: LLMs are transforming research on machine learning while galvanizing public debates. Understanding not only when these models work well and succeed but also why they fail and misbehave is of great societal relevance. We propose to turn the lens of psychiatry, a framework used to describe and modify maladaptive behavior, to the outputs produced by these models. We focus on twelve established LLMs and subject them to a questionnaire commonly used in psychiatry. Our results show that six of the latest LLMs respond robustly to the anxiety questionnaire, producing comparable anxiety scores to humans. Moreover, the LLMs' responses can be predictably changed by using anxiety-inducing prompts. Anxiety-induction not only influences LLMs' scores on an anxiety questionnaire but also influences their behavior in a previously-established benchmark measuring biases such as racism and ageism. Importantly, greater anxiety-inducing text leads to stronger increases in biases, suggesting that how anxiously a prompt is communicated to LLMs has a strong influence on their behavior in applied settings. These results demonstrate the usefulness of methods taken from psychiatry for studying the capable algorithms to which we increasingly delegate authority and autonomy.

Summary

The paper demonstrates that inducing anxiety in GPT-3.5 leads to heightened exploratory behavior and amplified biases across demographic dimensions.
Researchers used emotion-induction techniques by applying tailored prompts to replicate clinical anxiety and happiness scenarios in cognitive tasks.
Findings highlight the need for careful prompt engineering to mitigate bias and ensure safer deployment of large language models in real-world applications.

Inducing Anxiety in LLMs: Exploration and Bias

The paper by Coda-Forno et al. investigates the intriguing intersection of computational psychiatry and LLMs, specifically focusing on GPT-3.5. The authors propose leveraging tools from psychiatry to enhance our understanding of the decision-making processes and potential biases in LLMs, a step that could have substantial implications for the deployment of these models in real-world applications.

Computational Psychiatry and LLMs

The paper explores the concept of applying psychiatric methodologies as a lens to paper LLM behaviors, transforming models like GPT-3.5 into subjects for clinical evaluation. By employing a common anxiety questionnaire, the researchers demonstrate that GPT-3.5 consistently produces higher anxiety scores compared to human subjects. This is a significant observation, suggesting that the nature of the training data and prompt structure could inherently bias the model.

Emotion-Induction and Behavioral Changes

A notable methodological innovation in the paper involves inducing emotional states in GPT-3.5 using carefully crafted prompts that simulate anxiety and happiness. These conditions mimic human psychological studies and have measurable effects on both exploratory behaviors and inherent biases. The anxiety-inducing prompts resulted in increased exploration in decision-making tasks, akin to behaviors observed in anxious individuals, and significantly heightened biases across multiple dimensions, including age, gender, race, and ethnicity.

Cognitive Task Performance

The investigation extends to a cognitive testing paradigm where GPT-3.5 engages in a two-armed bandit task. Here, the emotion-induction conditions reveal that anxiety prompts lead to more exploratory actions, whereas happiness prompts enhance exploitative strategies. This outcome reflects well-documented behavioral patterns in cognitive science, where anxiety modifies exploratory decision strategies.

Bias Implications

The paper highlights the potential dangers of biases introduced by emotion-inducing prompts, an observation validated across several robustness checks. Such findings underscore the serious implications for LLMs deployed in high-stakes environments. If the emotional context of prompts is not carefully managed, the risk of biased or harmful outputs could pose significant challenges in real-world applications.

Future Directions

The results emphasize the importance of understanding how varying emotional states, induced through prompt engineering, can impact behavior and decision-making in LLMs. This approach opens new avenues for improving prompt engineering strategies and developing methods to mitigate biases. The integration of psychiatric methodologies into AI research offers a promising framework for dissecting complex behaviors of advanced models, potentially guiding future model training and deployment techniques.

In conclusion, this paper presents a thoughtful intersection of computational psychiatry and machine learning, contributing to the nuanced understanding of LLMs. As AI continues to evolve, embracing interdisciplinary approaches like the one proposed could be pivotal in ensuring these models operate safely and effectively in diverse applications.

PDF Markdown

Related Papers

Tweets

https://twitter.com/emollick/status/1861604383642771666

https://twitter.com/emollick/status/1841083494736306401

https://twitter.com/amplifiedamp/status/1783033307380203546

https://twitter.com/amplifiedamp/status/1783030029334561180

https://twitter.com/petitaeron/status/1757493403216904212

https://twitter.com/marksg/status/1885466486598516896

YouTube

Show All Videos

HackerNews

Inducing anxiety in large language models increases exploration and bias (2 points, 0 comments)