Evidence of a log scaling law for political persuasion with large language models (2406.14508v1)

Published 20 Jun 2024 in cs.CL, cs.AI, cs.CY, and cs.HC

Abstract: LLMs can now generate political messages as persuasive as those written by humans, raising concerns about how far this persuasiveness may continue to increase with model size. Here, we generate 720 persuasive messages on 10 U.S. political issues from 24 LLMs spanning several orders of magnitude in size. We then deploy these messages in a large-scale randomized survey experiment (N = 25,982) to estimate the persuasive capability of each model. Our findings are twofold. First, we find evidence of a log scaling law: model persuasiveness is characterized by sharply diminishing returns, such that current frontier models are barely more persuasive than models smaller in size by an order of magnitude or more. Second, mere task completion (coherence, staying on topic) appears to account for larger models' persuasive advantage. These findings suggest that further scaling model size will not much increase the persuasiveness of static LLM-generated messages.

Citations (5)

View on Semantic Scholar

Summary

The paper demonstrates that LLM persuasive capacity scales logarithmically, revealing diminishing returns beyond a specific size threshold.
A survey experiment with 25,982 participants and 720 messages across 24 LLMs highlights the importance of task completion in political persuasion.
Findings indicate that increasing model size mainly enhances message coherence, suggesting resource allocation should focus on optimizing task-specific performance.

Evidence of a Log Scaling Law for Political Persuasion with LLMs

The paper "Evidence of a Log Scaling Law for Political Persuasion with LLMs" by Hackenburg et al. investigates the relationship between the size of LLMs and their ability to persuade human audiences in political contexts. The central question of this paper is whether increasing the size of LLMs linearly relates to their persuasive capabilities.

Summary of the Research

The authors conducted a massive survey experiment, engaging 25,982 participants to evaluate the persuasiveness of 720 messages generated by 24 different LLMs on 10 U.S. political issues. These models ranged in size across several orders of magnitude, including cutting-edge commercial models like Claude-3-Opus and GPT-4-Turbo. The paper serves as a deep empirical inquiry into the scaling properties of these models, focusing on two primary findings:

Logarithmic Scaling Law: The data indicates that the persuasiveness of LLMs exhibits sharply diminishing returns with respect to model size. Essentially, this log scaling law shows that larger models do not significantly outperform those smaller by an order of magnitude in terms of persuasive power. For instance, frontier models like Claude-3-Opus and GPT-4-Turbo were found to be only marginally more persuasive than Qwen1.5-7B.
Task Completion as a Mediator: The paper finds that the advantage of larger models in political persuasion primarily stems from their ability to complete the task more effectively. Task completion is defined by coherence, staying on the topic, and correctly arguing the assigned issue stance. The largest models already score perfectly in these aspects, essentially hitting a ceiling in terms of further persuasive gains.

Experimental Methodology

The experimental design involved a randomized survey where participants were exposed to persuasive messages either generated by LLMs or human-written, and some to no persuasive message at all (control group). The key steps included:

Model Selection and Training: The researchers selected models from open-source pre-trained sets and fine-tuned them on consistent instruction-following data to standardize task completion without optimizing them explicitly for persuasion.
Message Generation: Messages were generated using varied prompts to mitigate the sensitivity of LLMs to input variations.
Statistical Analysis: The primary statistical approach was a random-effects meta-analysis to estimate the treatment effects of individual messages, taking into account variability across political issues and LLMs.

Key Findings and Robustness

The log scaling law suggests an imminent ceiling for the persuasive capabilities of LLMs in generating static messages. Larger models excel due to task completion capabilities, rather than inherent persuasiveness improvements per se. This finding held robust even after various checks, including different quadratic and cubic terms, alternative assumptions about model size for unknown-paradigm figures like Claude-3-Opus and GPT-4-Turbo, and controlling for variations across model families.

Implications and Future Directions

Practical Implications: The results indicate that beyond a certain size, increasing the number of parameters in LLMs does not substantially enhance their persuasive abilities for static political messages. This finding could impact how resources are allocated in developing future models, especially given the computational and financial costs associated with training larger models.

Theoretical Implications: The paper contributes to a nuanced understanding of scaling laws in LLMs, suggesting task-specific performance ceilings. This insight aligns with emerging viewpoints in the field that different tasks exhibit diverse scaling relationships.

Speculations on Future Developments: In light of these findings, future research might explore multi-turn dialogues or personalized interactions, assessing whether larger models demonstrate improved persuasiveness in more dynamic interaction contexts. Investigating the effects of in-domain fine-tuning specifically for persuasion could also reveal pathways to enhance model performance.

Conclusion

Hackenburg et al. provide critical empirical evidence on the scaling properties of LLMs in political persuasion. By uncovering a log scaling law, the paper opens new avenues for understanding and optimizing LLM capabilities in specific tasks, cautioning against assumptions that larger models inherently yield proportional gains in performance. This work serves as a foundational piece for researchers and policymakers to assess and respond to the persuasive impacts of LLMs, particularly in the sensitive domain of political communication.

PDF Markdown

Related Papers

Tweets

https://twitter.com/mpopv/status/1804895350224015855

https://twitter.com/emollick/status/1804930679609708963

https://twitter.com/KobiHackenburg/status/1804143229131976944

https://twitter.com/captgouda24/status/1875662221122138258

https://twitter.com/eporres/status/1805321251310776604

https://twitter.com/Prasad_Kothari/status/1804558296650859002

Reddit

"Evidence of a log scaling law for political persuasion with large language models", Hackenburg et al 2024 (1 blind/untargeted 250-word message from untrained LLMs can change reported opinions by 5%) (21 points, 5 comments)