Are Language Models Consequentialist or Deontological Moral Reasoners?
Journal:
arXiv
Published Date:
May 27, 2025
Abstract
As AI systems increasingly navigate applications in healthcare, law, and
governance, understanding how they handle ethically complex scenarios becomes
critical. Previous work has mainly examined the moral judgments in large
language models (LLMs), rather than their underlying moral reasoning process.
In contrast, we focus on a large-scale analysis of the moral reasoning traces
provided by LLMs. Furthermore, unlike prior work that attempted to draw
inferences from only a handful of moral dilemmas, our study leverages over 600
distinct trolley problems as probes for revealing the reasoning patterns that
emerge within different LLMs. We introduce and test a taxonomy of moral
rationales to systematically classify reasoning traces according to two main
normative ethical theories: consequentialism and deontology. Our analysis
reveals that LLM chains-of-thought tend to favor deontological principles based
on moral obligations, while post-hoc explanations shift notably toward
consequentialist rationales that emphasize utility. Our framework provides a
foundation for understanding how LLMs process and articulate ethical
considerations, an important step toward safe and interpretable deployment of
LLMs in high-stakes decision-making environments. Our code is available at
https://github.com/keenansamway/moral-lens .