Emergent Mind

Evidence of a log scaling law for political persuasion with large language models

(2406.14508)
Published Jun 20, 2024 in cs.CL , cs.AI , cs.CY , and cs.HC

Abstract

Large language models can now generate political messages as persuasive as those written by humans, raising concerns about how far this persuasiveness may continue to increase with model size. Here, we generate 720 persuasive messages on 10 U.S. political issues from 24 language models spanning several orders of magnitude in size. We then deploy these messages in a large-scale randomized survey experiment (N = 25,982) to estimate the persuasive capability of each model. Our findings are twofold. First, we find evidence of a log scaling law: model persuasiveness is characterized by sharply diminishing returns, such that current frontier models are barely more persuasive than models smaller in size by an order of magnitude or more. Second, mere task completion (coherence, staying on topic) appears to account for larger models' persuasive advantage. These findings suggest that further scaling model size will not much increase the persuasiveness of static LLM-generated messages.

Language model persuasiveness scales logarithmically with size; includes raw estimates and meta-analytic treatment effects.

Overview

  • The paper investigates the relationship between the size of LLMs and their persuasive abilities in political contexts, uncovering a logarithmic scaling law indicating diminishing returns with increasing model size.

  • A massive survey experiment with 25,982 participants evaluated the persuasiveness of 720 messages from 24 different LLMs on 10 U.S. political issues, identifying task completion as a significant factor in model persuasiveness.

  • The findings imply that beyond a certain size, larger LLMs do not significantly enhance persuasive abilities for static political messages, suggesting a potential ceiling for their performance in this specific task.

Evidence of a Log Scaling Law for Political Persuasion with LLMs

The paper "Evidence of a Log Scaling Law for Political Persuasion with LLMs" by Hackenburg et al. investigates the relationship between the size of LLMs and their ability to persuade human audiences in political contexts. The central question of this study is whether increasing the size of LLMs linearly relates to their persuasive capabilities.

Summary of the Research

The authors conducted a massive survey experiment, engaging 25,982 participants to evaluate the persuasiveness of 720 messages generated by 24 different LLMs on 10 U.S. political issues. These models ranged in size across several orders of magnitude, including cutting-edge commercial models like Claude-3-Opus and GPT-4-Turbo. The study serves as a deep empirical inquiry into the scaling properties of these models, focusing on two primary findings:

  1. Logarithmic Scaling Law: The data indicates that the persuasiveness of LLMs exhibits sharply diminishing returns with respect to model size. Essentially, this log scaling law shows that larger models do not significantly outperform those smaller by an order of magnitude in terms of persuasive power. For instance, frontier models like Claude-3-Opus and GPT-4-Turbo were found to be only marginally more persuasive than Qwen1.5-7B.
  2. Task Completion as a Mediator: The study finds that the advantage of larger models in political persuasion primarily stems from their ability to complete the task more effectively. Task completion is defined by coherence, staying on the topic, and correctly arguing the assigned issue stance. The largest models already score perfectly in these aspects, essentially hitting a ceiling in terms of further persuasive gains.

Experimental Methodology

The experimental design involved a randomized survey where participants were exposed to persuasive messages either generated by LLMs or human-written, and some to no persuasive message at all (control group). The key steps included:

  • Model Selection and Training: The researchers selected models from open-source pre-trained sets and fine-tuned them on consistent instruction-following data to standardize task completion without optimizing them explicitly for persuasion.
  • Message Generation: Messages were generated using varied prompts to mitigate the sensitivity of LLMs to input variations.
  • Statistical Analysis: The primary statistical approach was a random-effects meta-analysis to estimate the treatment effects of individual messages, taking into account variability across political issues and LLMs.

Key Findings and Robustness

The log scaling law suggests an imminent ceiling for the persuasive capabilities of LLMs in generating static messages. Larger models excel due to task completion capabilities, rather than inherent persuasiveness improvements per se. This finding held robust even after various checks, including different quadratic and cubic terms, alternative assumptions about model size for unknown-paradigm figures like Claude-3-Opus and GPT-4-Turbo, and controlling for variations across model families.

Implications and Future Directions

Practical Implications: The results indicate that beyond a certain size, increasing the number of parameters in LLMs does not substantially enhance their persuasive abilities for static political messages. This finding could impact how resources are allocated in developing future models, especially given the computational and financial costs associated with training larger models.

Theoretical Implications: The study contributes to a nuanced understanding of scaling laws in LLMs, suggesting task-specific performance ceilings. This insight aligns with emerging viewpoints in the field that different tasks exhibit diverse scaling relationships.

Speculations on Future Developments: In light of these findings, future research might explore multi-turn dialogues or personalized interactions, assessing whether larger models demonstrate improved persuasiveness in more dynamic interaction contexts. Investigating the effects of in-domain fine-tuning specifically for persuasion could also reveal pathways to enhance model performance.

Conclusion

Hackenburg et al. provide critical empirical evidence on the scaling properties of LLMs in political persuasion. By uncovering a log scaling law, the study opens new avenues for understanding and optimizing LLM capabilities in specific tasks, cautioning against assumptions that larger models inherently yield proportional gains in performance. This work serves as a foundational piece for researchers and policymakers to assess and respond to the persuasive impacts of LLMs, particularly in the sensitive domain of political communication.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.