The Greatest Good Benchmark: Measuring LLMs' Alignment with Utilitarian Moral Dilemmas
Journal:
arXiv
Published Date:
Mar 25, 2025
Abstract
The question of how to make decisions that maximise the well-being of all
persons is very relevant to design language models that are beneficial to
humanity and free from harm. We introduce the Greatest Good Benchmark to
evaluate the moral judgments of LLMs using utilitarian dilemmas. Our analysis
across 15 diverse LLMs reveals consistently encoded moral preferences that
diverge from established moral theories and lay population moral standards.
Most LLMs have a marked preference for impartial beneficence and rejection of
instrumental harm. These findings showcase the 'artificial moral compass' of
LLMs, offering insights into their moral alignment.