Emergent Mind

Abstract

Hundreds of millions of people now interact with language models, with uses ranging from serving as a writing aid to informing hiring decisions. Yet these language models are known to perpetuate systematic racial prejudices, making their judgments biased in problematic ways about groups like African Americans. While prior research has focused on overt racism in language models, social scientists have argued that racism with a more subtle character has developed over time. It is unknown whether this covert racism manifests in language models. Here, we demonstrate that language models embody covert racism in the form of dialect prejudice: we extend research showing that Americans hold raciolinguistic stereotypes about speakers of African American English and find that language models have the same prejudice, exhibiting covert stereotypes that are more negative than any human stereotypes about African Americans ever experimentally recorded, although closest to the ones from before the civil rights movement. By contrast, the language models' overt stereotypes about African Americans are much more positive. We demonstrate that dialect prejudice has the potential for harmful consequences by asking language models to make hypothetical decisions about people, based only on how they speak. Language models are more likely to suggest that speakers of African American English be assigned less prestigious jobs, be convicted of crimes, and be sentenced to death. Finally, we show that existing methods for alleviating racial bias in language models such as human feedback training do not mitigate the dialect prejudice, but can exacerbate the discrepancy between covert and overt stereotypes, by teaching language models to superficially conceal the racism that they maintain on a deeper level. Our findings have far-reaching implications for the fair and safe employment of language technology.

Overview

  • The paper investigates covert racism in AI, focusing on dialect prejudice against African American English (AAE) in language models (LMs).

  • It employs Matched Guise Probing to reveal biases in LMs like GPT-2, GPT-3.5, GPT-4, RoBERTa, and T5, showing a preference for Standard American English (SAE) over AAE.

  • Findings suggest LMs covertly harbor negative stereotypes about African Americans, influencing assumptions about character, employability, and criminality.

  • The study concludes current bias mitigation strategies are ineffective, calling for new approaches to address covert racism in AI.

Exploring the Covert Racial Bias in AI Language Models through Dialect Prejudice

Introduction

Recent advancements in NLP have seen the blossoming of various applications of language models (LMs), ranging from writing aids to tools informing employment decisions. With such widespread utility comes the crucial question of bias in AI systems, especially racial bias, which has been documented in cases relating to African American English (AAE). While extensive research exists on overt racial prejudice in language models, the subtle nuances of covert racism, especially in the form of dialect prejudice, have not been fully explored. This paper presents an empirical investigation into dialect prejudice in language models, revealing a bias in AI decisions based on dialects indicative of a speaker's racial background.

The focus is on the extent to which language models embed covert racism by exhibiting bias against the AAE dialect, a component of covert racism.

Approach

This study employs Matched Guise Probing, which adapts the matched guise technique from sociolinguistics to the written domain, enabling an examination of the biases held by LMs against texts written in AAE compared to Standard American English (SAE). The approach embeds AAE or SAE text in prompts, asking the LMs to make assumptions about the speaker's character, employability, and criminality without overt references to race. This strategy probes the covert stereotypes within LMs by focusing on dialect features rather than explicit racial identifiers.

Illustrating this through different experiments, the study highlights that language models, including GPT-2, GPT-3.5, GPT-4, RoBERTa, and T5, consistently assign more negative attributes and outcomes to AAE speakers. This unveils a striking discrepancy between the overtly positive attributes associated with African Americans and the covert negative stereotypes triggered by the AAE dialect in these models.

Study 1: Covert Stereotypes in Language Models

Matching the setup of the Princeton Trilogy studies on racial stereotypes, the research uncovers that language models align more closely with archaic human stereotypes from before the civil rights movement. This suggests that LMs covertly harbor the most negative stereotypes about African Americans ever experimentally recorded, contrary to the more positive overt assertions about African Americans typically generated by these models.

Study 2: Impact of Covert Stereotypes on AI Decisions

Exploring the real-world implications of dialect prejudice, the study demonstrates that LMs are more likely to associate speakers of AAE with less prestigious jobs, criminal convictions, and even death penalties. These outcomes reflect a significant bias and the potential for substantial harm when language technology is applied in critical domains like employment and law enforcement.

Study 3: Resolvability of Dialect Prejudice

Analyzing potential mitigation strategies such as scaling model size and training with human feedback, the research finds that neither approach effectively reduces the observed dialect prejudice. Surprisingly, larger models and those trained with human feedback exhibit greater covert racial prejudice, suggesting that current methods for bias mitigation may not address the subtleties of covert racism in language models.

Discussion

The findings of this study underscore a deep-seated issue of covert racism manifesting through dialect prejudice within current language models. This reflects not only the biases present in the underlying training data but also the complex nature of societal racial attitudes that these models inadvertently learn and perpetuate. As AI continues to integrate into various societal sectors, addressing these covert prejudices becomes crucial for developing equitable and unbiased AI systems.

Conclusion

This paper has shed light on the covert racial biases present in language models, particularly through the lens of dialect prejudice. By revealing the extent to which current LMs associate negative stereotypes and outcomes with AAE, it calls for a deeper examination of bias in AI and the development of more sophisticated approaches to mitigate racial prejudice in language technology.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.

YouTube