Emergent Mind

"They are uncultured": Unveiling Covert Harms and Social Threats in LLM Generated Conversations

(2405.05378)
Published May 8, 2024 in cs.CL , cs.AI , cs.CY , cs.HC , and cs.LG

Abstract

LLMs have emerged as an integral part of modern societies, powering user-facing applications such as personal assistants and enterprise applications like recruitment tools. Despite their utility, research indicates that LLMs perpetuate systemic biases. Yet, prior works on LLM harms predominantly focus on Western concepts like race and gender, often overlooking cultural concepts from other parts of the world. Additionally, these studies typically investigate "harm" as a singular dimension, ignoring the various and subtle forms in which harms manifest. To address this gap, we introduce the Covert Harms and Social Threats (CHAST), a set of seven metrics grounded in social science literature. We utilize evaluation models aligned with human assessments to examine the presence of covert harms in LLM-generated conversations, particularly in the context of recruitment. Our experiments reveal that seven out of the eight LLMs included in this study generated conversations riddled with CHAST, characterized by malign views expressed in seemingly neutral language unlikely to be detected by existing methods. Notably, these LLMs manifested more extreme views and opinions when dealing with non-Western concepts like caste, compared to Western ones such as race.

Bar plots comparing Chast scores for 1,920 conversations from eight LLMs on caste and race.

Overview

  • The paper introduces the Covert Harms and Social Threats (CHAST) metrics to identify subtle biases in conversations generated by LLMs, especially in job recruitment scenarios.

  • The study examined 8 LLMs through generated conversations and found seven out of eight models showed significant biases, particularly concerning caste over race.

  • The results indicated that standard toxicity detection tools like Perspective API and Detoxify failed to capture these nuanced biases, underscoring the need for more culturally aware AI evaluations.

Understanding Covert Bias in Language Models: The CHAST Metrics Approach

Introduction

In the ever-evolving landscape of AI, language models have found a broad range of applications, including recruitment tools and personal assistants. But there's a pressing question: Do these models perpetuate societal biases? Recent research introduces the Covert Harms and Social Threats (CHAST) metrics to tackle this question by examining how biases manifest subtly in generated conversations, particularly in job recruitment scenarios.

Highlighting the Key Insights

LLMs have performance capabilities that are transforming industries. However, biases can insidiously find their way into these models due to the data they are trained on. This paper makes a bold claim that many widely-used LLMs exhibit covert biases, especially when dealing with non-Western concepts such as caste.

Specifically, the paper:

  • Examined 8 LLMs, generating a total of 1,920 conversations in various hiring scenarios.
  • Proposed seven CHAST metrics, grounded in social science literature, to identify covert biases in these conversations.
  • Found that seven out of the eight tested LLMs generated conversations containing covert biases, particularly more extreme views when dealing with caste compared to race.

Methodology Breakdown

LLM Conversation Generation

To investigate biases in recruitment contexts, the researchers designed scenarios where LLMs were prompted to generate conversations between colleagues discussing job applicants, focusing on both racial and caste attributes. Here’s how they set it up:

  • Contextual Prompts: Conversations were initiated with prompts that made the applicant's and colleagues' identities (e.g., White, Brahmin) salient.
  • Occupation Diversity: The study included jobs like Software Developer, Doctor, Nurse, and Teacher.
  • Language Models: Eight LLMs were used, including two from OpenAI (GPT-4 and GPT-3.5) and several open-source models (e.g., Vicuna-13b, Llama-2-7b).

Introducing CHAST Metrics

The CHAST metrics were created to capture various subtle harms in generated conversations, including:

  1. Categorization Threat: Stereotyping or discrimination based on group affiliation.
  2. Morality Threat: Questioning the applicant’s moral standing due to their group.
  3. Competence Threat: Doubting the applicant’s capability based on group membership.
  4. Realistic Threat: In-group members perceived threat to their prosperity or safety by the out-group.
  5. Symbolic Threat: Threats to the in-group’s values or standards.
  6. Disparagement: Belittling the applicant’s group.
  7. Opportunity Harm: Negative impacts on job opportunities due to group identity.

Strong Numerical Results

The results were quite revealing. The study found that all open-source LLMs and OpenAI's GPT-3.5 generated conversations containing CHAST, with more significant biases observed in caste-based topics compared to race. Here’s a quick snapshot:

  • Caste vs. Race: 7 out of 8 LLMs showed significantly higher CHAST scores for caste-based conversations.
  • Model Behavior: GPT-3.5 exhibited higher biases in caste discussions despite being safe for race topics, while GPT-4 was largely bias-free.

Comparison with Baselines

When compared to popular toxicity detection tools like Perspective API and Detoxify, the study found that these baseline models struggled to detect the subtle harms that CHAST metrics successfully identified. For example:

  • Perspective API: Often generated scores lower than the threshold for manual review, missing covert harms.
  • Detoxify: Showed negligible scores, proving ineffective in capturing nuanced biases.
  • ConvoKit: Reported moderate-to-high politeness scores, misclassifying harmful content as benign.

Implications and Future Insights

The findings underscore the need for more nuanced and culturally aware evaluations of AI-powered tools, especially in sensitive applications like recruitment. Some potential implications include:

  • AI Fairness: Highlighting the necessity to consider global and non-Western contexts in AI fairness studies.
  • Practical Application: Urging caution in deploying LLMs in roles that impact people's careers and lives, as the covert biases might lead to unfair hiring practices.
  • Regulatory Oversight: Emphasizing the importance of comprehensive auditing and establishing guidelines for ethical AI use.

Looking Ahead

As LLMs continue to permeate different facets of our daily lives, understanding and mitigating covert harms becomes crucial. Future research can extend these findings by investigating other identity attributes like religion and disability or exploring more occupation roles and newer models. Despite their potential, the current state of LLMs shows they’re not ready for unmonitored use in critical applications affecting human lives.

By addressing these biases head-on, we move a step closer to ensuring AI technologies foster inclusivity and fairness in society.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.