Measuring gender and racial biases in large language models: Intersectional evidence from automated resume evaluation.

Journal: PNAS nexus

Published Date: Mar 12, 2025

Abstract

In traditional decision-making processes, social biases of human decision makers can lead to unequal economic outcomes for underrepresented social groups, such as women and racial/ethnic minorities (1-4). Recently, the growing popularity of large language model (LLM)-based AI signals a potential shift from human to AI-based decision-making. How would this transition affect the distributional outcomes across social groups? Here, we investigate the gender and racial biases of a number of commonly used LLMs, including OpenAI's GPT-3.5 Turbo and GPT-4o, Google's Gemini 1.5 Flash, Anthropic AI's Claude 3.5 Sonnet, and Meta's Llama 3-70b, in a high-stakes decision-making setting of assessing entry-level job candidates from diverse social groups. Instructing the models to score ∼361,000 resumes with randomized social identities, we find that the LLMs award higher assessment scores for female candidates with similar work experience, education, and skills, but lower scores for black male candidates with comparable qualifications. These biases may result in ∼1-3 percentage-point differences in hiring probabilities for otherwise similar candidates at a certain threshold and are consistent across various job positions and subsamples. Meanwhile, many models are biased against black male candidates. Our results indicate that LLM-based AI systems demonstrate significant biases, varying in terms of the directions and magnitudes across different social groups. Further research is needed to comprehend the root causes of these outcomes and develop strategies to minimize the remaining biases in AI systems. As AI-based decision-making tools are increasingly employed across diverse domains, our findings underscore the necessity of understanding and addressing the potential unequal outcomes to ensure equitable outcomes across social groups.

Authors

Jiafu An

Department of Real Estate and Construction, University of Hong Kong, Hong Kong SAR 999077, China.
Difang Huang

Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China.
Chen Lin

Faculty of Business and Economics, University of Hong Kong, Hong Kong SAR 999077, China.
Mingzhu Tai

Faculty of Business and Economics, University of Hong Kong, Hong Kong SAR 999077, China.

Keywords

No keywords available for this article.

External Resources

View on PubMed Access via DOI PubMed (40144775)

Measuring gender and racial biases in large language models: Intersectional evidence from automated resume evaluation.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals