Who Gets the Callback? Generative AI and Gender Bias
Journal:
arXiv
Published Date:
Apr 30, 2025
Abstract
Generative artificial intelligence (AI), particularly large language models
(LLMs), is being rapidly deployed in recruitment and for candidate
shortlisting. We audit several mid-sized open-source LLMs for gender bias using
a dataset of 332,044 real-world online job postings. For each posting, we
prompt the model to recommend whether an equally qualified male or female
candidate should receive an interview callback. We find that most models tend
to favor men, especially for higher-wage roles. Mapping job descriptions to the
Standard Occupational Classification system, we find lower callback rates for
women in male-dominated occupations and higher rates in female-associated ones,
indicating occupational segregation. A comprehensive analysis of linguistic
features in job ads reveals strong alignment of model recommendations with
traditional gender stereotypes. To examine the role of recruiter identity, we
steer model behavior by infusing Big Five personality traits and simulating the
perspectives of historical figures. We find that less agreeable personas reduce
stereotyping, consistent with an agreeableness bias in LLMs. Our findings
highlight how AI-driven hiring may perpetuate biases in the labor market and
have implications for fairness and diversity within firms.