Supervised Domain Adaptation Mitigates Cross-Ethnicity Prediction Error in Neuroimaging Based Cognitive Prediction
Journal:
bioRxiv
Published Date:
May 28, 2026
Abstract
Research has developed machine-learning models to predict cognitive and clinical outcomes from neuroimaging data, yet fairness and generalizability remain key challenges. Large-scale datasets are often demographically imbalanced, leading to systematic performance disparities across ethnic groups, with models typically performing better for majority populations. Here, we examine whether supervised domain adaptation can mitigate such bias. Using the ABCD dataset, we treat White-American participants as the source domain and African-American participants as the target domain. We compare four domain-adaptation methods--balanced weighting, two-stage TrAdaBoost, feature augmentation with SrcOnly prediction, and linear interpolation--against standard training in predicting cognition from 80 MRI measures. All methods reduced prediction error for African American participants, particularly for MRI measures with large baseline disparities (e.g., structural MRI), while offering limited gains where initial gaps were small (e.g., functional connectivity). Balanced weighting performed best, highlighting that simple, low-cost approaches can effectively reduce cross-ethnicity performance gaps for underrepresented populations.
Significant StatementLarge-scale neuroimaging datasets increasingly enable machine-learning models to predict cognitive and clinical outcomes; however, these datasets are often ethnically imbalanced. As a result, predictive models tend to generalize poorly to underrepresented populations. We demonstrate that, across 80 MRI phenotypes, a class of machine-learning approaches collectively known as supervised domain adaptation can substantially reduce cross-ethnicity disparities in neuroimaging-based cognitive prediction, even when only limited data from underrepresented groups are available. Among the methods evaluated, balanced weighting achieved the best performance while maintaining low computational cost. Together, these findings provide a practical and scalable framework for improving fairness and generalizability in neuroimaging-based machine learning under realistic conditions of ethnic imbalance.