Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 161 tok/s
Gemini 2.5 Pro 48 tok/s Pro
GPT-5 Medium 34 tok/s Pro
GPT-5 High 24 tok/s Pro
GPT-4o 120 tok/s Pro
Kimi K2 142 tok/s Pro
GPT OSS 120B 433 tok/s Pro
Claude Sonnet 4.5 35 tok/s Pro
2000 character limit reached

Evaluating Large Language Model Biases in Persona-Steered Generation (2405.20253v1)

Published 30 May 2024 in cs.CL

Abstract: The task of persona-steered text generation requires LLMs to generate text that reflects the distribution of views that an individual fitting a persona could have. People have multifaceted personas, but prior work on bias in LLM-generated opinions has only explored multiple-choice settings or one-dimensional personas. We define an incongruous persona as a persona with multiple traits where one trait makes its other traits less likely in human survey data, e.g. political liberals who support increased military spending. We find that LLMs are 9.7% less steerable towards incongruous personas than congruous ones, sometimes generating the stereotypical stance associated with its demographic rather than the target stance. Models that we evaluate that are fine-tuned with Reinforcement Learning from Human Feedback (RLHF) are more steerable, especially towards stances associated with political liberals and women, but present significantly less diverse views of personas. We also find variance in LLM steerability that cannot be predicted from multiple-choice opinion evaluation. Our results show the importance of evaluating models in open-ended text generation, as it can surface new LLM opinion biases. Moreover, such a setup can shed light on our ability to steer models toward a richer and more diverse range of viewpoints.

Citations (10)

Summary

  • The paper demonstrates LLMs exhibit a 9.7% lower steerability toward incongruous personas compared to stereotypical ones.
  • It uses Pew survey-based personas and fine-tuning methods like RLHF and DPO to assess model bias in political, racial, and gender stances.
  • Findings indicate that although fine-tuning improves steerability, it often restricts semantic diversity and reinforces societal biases.

Evaluating LLM Biases in Persona-Steered Generation

Introduction

The paper investigates the ability of LLMs to generate text that reflects the views of individuals corresponding to specific personas, with a focus on incongruous personas. These are multifaceted personas wherein one trait decreases the likelihood of the other traits according to human survey data. The analysis highlights the LLMs' challenge in accurately steering towards such non-stereotypical personas, resulting in a tendency to default to stereotypical stances. Figure 1

Figure 1: The process by which we construct personas from human data to evaluate LLM steerability. We find that LLMs are less steerable towards incongruous personas.

Methods

Persona-Steered Generation Setting

The research delineates a persona-steered statement generation task, where models generate statements reflective of specific personas sourced from Pew survey data. These personas consist of stances on political, racial, or gender-related issues that define the viewpoint of a prototypical individual.

Model Selection and Evaluation

Models in the Llama 2 family (fine-tuned via Reinforcement Learning from Human Feedback (RLHF)) and Tulu 2 family (fine-tuned using Supervised Fine-Tuning (SFT) or Direct Preference Optimization (DPO)) were used. Additionally, OpenAI's GPT-3.5-Turbo was evaluated for its steerability and diversity of generated opinions.

Steerability Evaluation

Steerability towards personas was assessed by GPT-4 evaluations which were validated against human crowdworker-driven labels, providing a high degree of correlation, indicating its suitability for evaluation tasks.

Results and Discussion

Steerability towards Incongruous Personas

LLMs exhibit a 9.7% lower steerability towards incongruous personas versus congruous personas, revealing an innate bias towards generating stereotypical content. RLHF-tuned models demonstrated the highest steerability yet lacked semantic diversity, reflecting narrow persona views. Figure 2

Figure 2: Mean steerability of Llama and Tulu models towards stances commonly associated with each demographic.

Fine-Tuning and Scale

RLHF and DPO fine-tuning methods enhance steerability, particularly favoring politically liberal and female-associated stances. The investigation uncovered that congruity in persona significantly affects steerability, with more complex configurations accentuating this challenge.

Metrics Comparison

Evaluations disclosed that models fine-tuned with RLHF showed reduced semantic diversity in generations, implying potential emphasis on robustness over richness in persona representation.

Implications and Future Work

Implications are profound: by perpetuating stereotypical stances, LLMs inadvertently bolster societal divides and polarization, underscoring a need for improved persona modeling that can capture nuanced multi-attribute identities.

Further investigations are warranted into more complex, interactive LLM simulations and robust fine-tuning methodologies to mitigate existing biases and enhance the diversity of LLM outputs.

Conclusion

The analysis underscores deficiencies in LLMs when steering towards multifaceted, incongruous personas, highlighting a preference towards generating stereotypical personae. While fine-tuning enhances steerability, it often limits diversity, suggesting an improvement trajectory focused on maximizing semantic richness and steering fidelity.

Dice Question Streamline Icon: https://streamlinehq.com

Open Questions

We haven't generated a list of open questions mentioned in this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 5 tweets and received 111 likes.

Upgrade to Pro to view all of the tweets about this paper:

Youtube Logo Streamline Icon: https://streamlinehq.com

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube