Equitable machine learning counteracts ancestral bias in precision medicine.
Journal:
Nature communications
PMID:
40064867
Abstract
Gold standard genomic datasets severely under-represent non-European populations, leading to inequities and a limited understanding of human disease. Therapeutics and outcomes remain hidden because we lack insights that could be gained from analyzing ancestrally diverse genomic data. To address this significant gap, we present PhyloFrame, a machine learning method for equitable genomic precision medicine. PhyloFrame corrects for ancestral bias by integrating functional interaction networks and population genomics data with transcriptomic training data. Application of PhyloFrame to breast, thyroid, and uterine cancers shows marked improvements in predictive power across all ancestries, less model overfitting, and a higher likelihood of identifying known cancer-related genes. Validation in fourteen ancestrally diverse datasets demonstrates that PhyloFrame is better able to adjust for ancestry bias across all populations. The ability to provide accurate predictions for underrepresented groups, in particular, is substantially increased. Analysis of performance in the most diverse continental ancestry group, African, illustrates how phylogenetic distance from training data negatively impacts model performance, as well as PhyloFrame's capacity to mitigate these effects. These results demonstrate how equitable artificial intelligence (AI) approaches can mitigate ancestral bias in training data and contribute to equitable representation in medical research.