Estimating wage disparities using foundation models.
Journal:
Proceedings of the National Academy of Sciences of the United States of America
Published Date:
May 30, 2025
Abstract
The rise of foundation models marks a paradigm shift in machine learning: instead of training specialized models from scratch, foundation models are trained on massive datasets before being adjusted or fine-tuned to make predictions on smaller datasets. Initially developed for text, foundation models have also excelled at making predictions about social science data. However, while many estimation problems in the social sciences use prediction as an intermediate step, they ultimately require different criteria for success. In this paper, we develop methods for fine-tuning foundation models to perform these estimation problems. We first characterize an omitted variable bias that can arise when a foundation model is fine-tuned in the standard way: to minimize predictive error. We then provide a set of conditions for fine-tuning under which estimates derived from a foundation model are [Formula: see text]-consistent. Based on this theory, we develop fine-tuning algorithms that empirically mitigate this omitted variable bias. To demonstrate our ideas, we study gender wage gap estimation. Classical methods for estimating the adjusted wage gap employ simple predictive models of wages, which can induce omitted variable bias because they condition on coarse summaries of career history. Instead, we use a custom-built foundation model, capturing a richer representation of career history. Using data from the Panel Study of Income Dynamics, we find that career history explains more of the gender wage gap than standard econometric models can measure, and we identify elements of career history that are omitted by standard models but are important for explaining the gap.