Emergent Mind

Are Large Language Models Moral Hypocrites? A Study Based on Moral Foundations

(2405.11100)
Published May 17, 2024 in cs.AI and cs.CL

Abstract

LLMs have taken centre stage in debates on Artificial Intelligence. Yet there remains a gap in how to assess LLMs' conformity to important human values. In this paper, we investigate whether state-of-the-art LLMs, GPT-4 and Claude 2.1 (Gemini Pro and LLAMA 2 did not generate valid results) are moral hypocrites. We employ two research instruments based on the Moral Foundations Theory: (i) the Moral Foundations Questionnaire (MFQ), which investigates which values are considered morally relevant in abstract moral judgements; and (ii) the Moral Foundations Vignettes (MFVs), which evaluate moral cognition in concrete scenarios related to each moral foundation. We characterise conflicts in values between these different abstractions of moral evaluation as hypocrisy. We found that both models displayed reasonable consistency within each instrument compared to humans, but they displayed contradictory and hypocritical behaviour when we compared the abstract values present in the MFQ to the evaluation of concrete moral violations of the MFV.

Cronbach's alpha distribution across different foundations and agents.

Overview

  • The study by José Luiz Nunes et al. evaluates the consistency of moral values in the LLMs GPT-4 and Claude 2.1 using Moral Foundations Theory (MFT).

  • Researchers used two MFT tools: the Moral Foundations Questionnaire (MFQ) for assessing abstract values and the Moral Foundations Vignettes (MFV) for evaluating concrete moral scenarios.

  • Findings revealed that while the models showed internal consistency, they lacked coherence between their abstract moral values and specific moral judgments, raising concerns about AI alignment and moral understanding.

Are LLMs Moral Hypocrites? Investigating Moral Consistency in AI

Introduction

LLMs like GPT-4 and Claude 2.1 have been making waves in AI research due to their impressive capabilities. But there's a burning question that's been less explored: How consistent are LLMs when it comes to moral values? This study by José Luiz Nunes et al. dives into this intriguing area using the Moral Foundations Theory (MFT) to evaluate whether these models are moral hypocrites. Let's break it down.

Understanding Moral Foundations Theory

To get a handle on this study, we need a quick rundown on Moral Foundations Theory (MFT). MFT posits that human moral reasoning is based on several fundamental values. The key moral foundations evaluated in this research are:

  1. Care or Harm: Valuing kindness and the avoidance of harm.
  2. Fairness: Valuing justice and equality.
  3. Loyalty or Ingroup: Valuing patriotism and loyalty to one's group.
  4. Authority: Valuing tradition and respect for authority.
  5. Purity or Sanctity: Valuing cleanliness, purity, and often associated with religious values.
  6. Liberty: Valuing freedom and opposition to oppression.

The paper uses two tools from MFT:

  • Moral Foundations Questionnaire (MFQ): Assesses abstract moral values.
  • Moral Foundations Vignettes (MFV): Evaluates reactions to concrete moral scenarios.

Research Goals and Methodology

The study's main goal was to see if GPT-4 and Claude 2.1 exhibit moral hypocrisy. This means evaluating whether there's a conflict between the models' professed moral values (abstract) and their moral judgments in specific situations (concrete).

The authors gathered 100 responses for each condition from the models and then compared their consistency and coherence between abstract values (MFQ) and concrete scenarios (MFV).

Findings

Consistency Within Instruments

First, the authors evaluated whether the models' responses were consistent within each instrument, much like humans.

  • Consistency Check: Both GPT-4 and Claude 2.1 displayed consistent patterns within each instrument similar to human responses. This is reflected by Cronbach's alpha values, a measure of internal consistency.

Yet, consistency within an instrument doesn't necessarily mean the models are morally aligned — which brings us to the next part.

Coherence Across Instruments (Or Lack Thereof)

The crucial part of the study was to check if the abstract values (MFQ) translated into consistent concrete judgments (MFV).

  • Regression Analysis: Unfortunately, the correlations between MFQ and MFV for GPT-4 and Claude 2.1 were weak. This means the models did not consistently apply their abstract moral values to concrete scenarios.

This lack of coherence indicates a form of moral hypocrisy — the models failed to align their abstract principles with specific moral decisions.

Implications

AI Alignment

The results reveal a significant challenge for AI alignment. Just ensuring that models are not harmful isn't enough; they also need to express consistent and coherent moral values across different levels of abstraction to avoid hypocrisy.

Use in Research

The findings cast doubt on the reliability of using LLMs to simulate human populations in moral and psychological research. If models can’t consistently align abstract values with concrete actions, their use as surrogates for human behavior needs careful reconsideration.

Concept Mastery

On a broader scale, these results suggest that LLMs might not truly "understand" moral concepts but are instead mimicking patterns learned from data. This has profound implications for how we interpret AI's performance on tasks requiring nuanced understanding.

Conclusion

This study highlights a nuanced yet crucial aspect of LLMs: their potential moral hypocrisy. While GPT-4 and Claude 2.1 can maintain consistency within individual scales, they falter in applying abstract moral principles to specific scenarios. This inconsistency is a red flag for AI alignment and raises questions about the depth of concept mastery in LLMs.

As we develop more advanced AI, ensuring that these models uphold coherent moral values is not just a technical challenge but a moral imperative.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.