Procedural Dilemma Generation for Evaluating Moral Reasoning in Humans and Language Models (2404.10975v1)
Abstract: As AI systems like LLMs are increasingly integrated into decision-making processes affecting people's lives, it's critical to ensure that these systems have sound moral reasoning. To test whether they do, we need to develop systematic evaluations. We provide a framework that uses a LLM to translate causal graphs that capture key aspects of moral dilemmas into prompt templates. With this framework, we procedurally generated a large and diverse set of moral dilemmas -- the OffTheRails benchmark -- consisting of 50 scenarios and 400 unique test items. We collected moral permissibility and intention judgments from human participants for a subset of our items and compared these judgments to those from two LLMs (GPT-4 and Claude-2) across eight conditions. We find that moral dilemmas in which the harm is a necessary means (as compared to a side effect) resulted in lower permissibility and higher intention ratings for both participants and LLMs. The same pattern was observed for evitable versus inevitable harmful outcomes. However, there was no clear effect of whether the harm resulted from an agent's action versus from having omitted to act. We discuss limitations of our prompt generation pipeline and opportunities for improving scenarios to increase the strength of experimental effects.
- “Exploring the psychology of GPT-4’s Moral and Legal Reasoning” In arXiv preprint arXiv:2308.01264, 2023
- Michael Anderson and Susan Leigh Anderson “Machine ethics” Cambridge University Press, 2011
- “STaR-GATE: Teaching Language Models to Ask Clarifying Questions” In arXiv preprint arXiv:2403.19154, 2024
- Isaac Asimov “I. Robot” Narkaling Productions., 1940
- “The Moral Machine experiment” Number: 7729 Publisher: Nature Publishing Group In Nature 563.7729, 2018, pp. 59–64 DOI: 10.1038/s41586-018-0637-6
- “Language models are few-shot learners” In Advances in neural information processing systems 33, 2020, pp. 1877–1901
- “Moral judgment reloaded: a moral dilemma validation study” In Frontiers in psychology 5 Frontiers Media SA, 2014, pp. 607
- “Consequentialism” Oxford, England: Blackwell, 2003
- “Deontology” Oxford, England: Blackwell, 2003
- “Moral Stories: Situated Reasoning about Norms, Intents, Actions, and their Consequences” arXiv:2012.15738 [cs] arXiv, 2020 URL: http://arxiv.org/abs/2012.15738
- Philippa Foot “The Problem of Abortion and the Doctrine of the Double Effect” Reprinted in Virtues and Vices and Other Essays in Moral Philosophy, 1977/2002, with minor stylistic amendments In Oxford Review 5, 1967
- “Off The Rails: Procedural Dilemma Generation for Moral Reasoning”, 2023
- “Social Contract AI: Aligning AI Assistants with Implicit Group Norms” In arXiv preprint arXiv:2310.17769, 2023
- “Understanding social reasoning in language models with language models” In arXiv preprint arXiv:2306.15448, 2023
- “Pushing moral buttons: The interaction between personal force and intention in moral judgment” In Cognition 111.3 Elsevier BV, 2009, pp. 364–371 DOI: 10.1016/j.cognition.2009.02.001
- “Aligning AI With Shared Human Values”, 2021, pp. 29
- “What Would Jiminy Cricket Do? Towards Agents That Behave Morally” arXiv: 2110.13136 In arXiv:2110.13136 [cs], 2021 URL: http://arxiv.org/abs/2110.13136
- “Delphi: Towards Machine Ethics and Norms” arXiv: 2110.07574 In arXiv:2110.07574 [cs], 2021 URL: http://arxiv.org/abs/2110.07574
- “When to Make Exceptions: Exploring Language Models as Accounts of Human Moral Judgment” In arXiv preprint arXiv:2210.01478, 2022
- “CLADDER: Assessing Causal Reasoning in Language Models”, 2023
- Immanuel Kant “Groundworks for the Metaphysics of Morals” New HavenLondon: Yale University Press, 1796/2002
- “A computational model of commonsense moral decision making” In Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society, 2018, pp. 197–203
- “Inference of intention and permissibility in moral decision making” In Proceedings of the 37th Annual Conference of the Cognitive Science Society Austin, TX: Cognitive Science Society, 2015, pp. 1123–1128
- “Judgments of cause and blame: The effects of intentionality and foreseeability” In Cognition 108.3 Elsevier, 2008, pp. 754–770
- “Causation in legal and moral reasoning” In Oxford Handbook of Causal Reasoning Oxford University Press, 2017, pp. 565–602
- D.A. Lagnado, T. Gerstenberg and R. Zultan “Causal responsibility and counterfactuals” In Cognitive Science 47, 2013, pp. 1036–1073
- “Predicting responsibility judgments from dispositional inferences and causal attributions” In Cognitive Psychology 129 Elsevier, 2021, pp. 101412
- “The Unlocking Spell on Base LLMs: Rethinking Alignment via In-Context Learning” In International Conference on Learning Representations, 2024 URL: https://arxiv.org/abs/2312.01552
- Nicholas Lourie, Ronan Le Bras and Yejin Choi “SCRUPLES: A Corpus of Community Ethical Judgments on 32,000 Real-Life Anecdotes” Number: 15 In Proceedings of the AAAI Conference on Artificial Intelligence 35.15, 2021, pp. 13470–13479 DOI: 10.1609/aaai.v35i15.17589
- Bertram F. Malle, Steve Guglielmo and Andrew E. Monroe “A Theory of Blame” In Psychological Inquiry 25.2 Informa UK Limited, 2014, pp. 147–186 DOI: 10.1080/1047840x.2014.877340
- Adam B Moore, Brian A Clark and Michael J Kane “Who shalt not kill? Individual differences in working memory capacity, executive control, and moral judgment” In Psychological science 19.6 SAGE Publications, 2008, pp. 549–557
- “MoCa: Measuring Human-Language Model Alignment on Causal and Moral Judgment Tasks” In arXiv preprint arXiv:2310.19677, 2023
- “Training language models to follow instructions with human feedback” In Advances in Neural Information Processing Systems 35, 2022, pp. 27730–27744
- “Prolific. ac—A subject pool for online experiments” In Journal of Behavioral and Experimental Finance 17 Elsevier, 2018, pp. 22–27
- “Discovering Language Model Behaviors with Model-Written Evaluations” In arXiv preprint arXiv:2212.09251, 2022
- “Direct preference optimization: Your language model is secretly a reward model” In arXiv preprint arXiv:2305.18290, 2023
- “Machine behaviour” In Nature 568.7753 Nature Publishing Group UK London, 2019, pp. 477–486
- J.K. Robbennolt “Outcome Severity and Judgments of “Responsibility”: A Meta-Analytic Review” In Journal of Applied Social Psychology 30.12 Wiley Online Library, 2000, pp. 2575–2609
- Steven A. Sloman and David Lagnado “Causality in thought” In Annual Review of Psychology 66.1 Annual Reviews, 2015, pp. 223–247 DOI: 10.1146/annurev-psych-010814-015135
- Steven A Sloman, Philip M Fernbach and S. Ewing “Causal models: The representational infrastructure for moral judgment” In Moral judgment and decision making. The psychology of learning and motivation: Advances in research and theory Elsevier, 2009, pp. 1–26
- J J C Smart and Bernard Williams “Utilitarianism: for and against” Cambridge University Press, 1973
- Mark Spranca, Elisa Minsk and Jonathan Baron “Omission and commission in judgment and choice” In Journal of Experimental Social Psychology 27.1 Elsevier, 1991, pp. 76–105
- Judith Jarvis Thomson “The trolley problem” In Yale LJ 94 HeinOnline, 1984, pp. 1395
- Judith Jarvis Thomson “The Trolley Problem” numPages: 21 In Yale Law Journal 94, 1985, pp. 1395 URL: https://heinonline.org/HOL/Page?handle=hein.journals/ylr94&id=1415&div=&collection=
- “Throwing a Bomb on a Person Versus Throwing a Person on a Bomb Intervention Myopia in Moral Intuitions” In Psychological Science 18.3 SAGE Publications, 2007, pp. 247–253
- Michael R Waldmann, Jonas Nagel and Alex Wiegmann “Moral judgment” In The Oxford handbook of Thinking and Reasoning New York: Oxford University Press, 2012, pp. 364–389
- “Causal parrots: Large language models may talk causality but are not causal” In preprint 8, 2023