Estimating wage disparities using foundation models.

Journal: Proceedings of the National Academy of Sciences of the United States of America

Published Date: May 30, 2025

Abstract

The rise of foundation models marks a paradigm shift in machine learning: instead of training specialized models from scratch, foundation models are trained on massive datasets before being adjusted or fine-tuned to make predictions on smaller datasets. Initially developed for text, foundation models have also excelled at making predictions about social science data. However, while many estimation problems in the social sciences use prediction as an intermediate step, they ultimately require different criteria for success. In this paper, we develop methods for fine-tuning foundation models to perform these estimation problems. We first characterize an omitted variable bias that can arise when a foundation model is fine-tuned in the standard way: to minimize predictive error. We then provide a set of conditions for fine-tuning under which estimates derived from a foundation model are [Formula: see text]-consistent. Based on this theory, we develop fine-tuning algorithms that empirically mitigate this omitted variable bias. To demonstrate our ideas, we study gender wage gap estimation. Classical methods for estimating the adjusted wage gap employ simple predictive models of wages, which can induce omitted variable bias because they condition on coarse summaries of career history. Instead, we use a custom-built foundation model, capturing a richer representation of career history. Using data from the Panel Study of Income Dynamics, we find that career history explains more of the gender wage gap than standard econometric models can measure, and we identify elements of career history that are omitted by standard models but are important for explaining the gap.

Authors

Keyon Vafa

Harvard Data Science Initiative, Harvard University, Cambridge, MA 02138.
Susan Athey

Graduate School of Business, Stanford University, Stanford, CA 94305.
David M Blei

Department of Computer Science, Columbia University, New York, NY 10027.

Keywords

Algorithms Female Humans Machine Learning Male Salaries and Fringe Benefits

External Resources

View on PubMed Access via DOI PubMed (40445754)

Estimating wage disparities using foundation models.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals

Estimating wage disparities using foundation models.

Abstract

Authors

Keywords

External Resources

Don't Miss the Future of Medicine

Popular Topics

Recent Journals