Equitable machine learning counteracts ancestral bias in precision medicine.

Journal: Nature communications
PMID:

Abstract

Gold standard genomic datasets severely under-represent non-European populations, leading to inequities and a limited understanding of human disease. Therapeutics and outcomes remain hidden because we lack insights that could be gained from analyzing ancestrally diverse genomic data. To address this significant gap, we present PhyloFrame, a machine learning method for equitable genomic precision medicine. PhyloFrame corrects for ancestral bias by integrating functional interaction networks and population genomics data with transcriptomic training data. Application of PhyloFrame to breast, thyroid, and uterine cancers shows marked improvements in predictive power across all ancestries, less model overfitting, and a higher likelihood of identifying known cancer-related genes. Validation in fourteen ancestrally diverse datasets demonstrates that PhyloFrame is better able to adjust for ancestry bias across all populations. The ability to provide accurate predictions for underrepresented groups, in particular, is substantially increased. Analysis of performance in the most diverse continental ancestry group, African, illustrates how phylogenetic distance from training data negatively impacts model performance, as well as PhyloFrame's capacity to mitigate these effects. These results demonstrate how equitable artificial intelligence (AI) approaches can mitigate ancestral bias in training data and contribute to equitable representation in medical research.

Authors

  • Leslie A Smith
    Department of Computer & Information Science & Engineering, University of Florida, 1889 Museum Rd, Gainesville, 32611, FL, USA.
  • James A Cahill
    Environmental Engineering Sciences Department, University of Florida, 365 Weil Hall, Gainesville, 32611, FL, USA.
  • Ji-Hyun Lee
    Division of Pulmonary and Critical Care Medicine, Department of Internal Medicine, CHA Bundang Medical Center, CHA University, Seongnam, Korea.
  • Kiley Graim
    Dept. of Biomolecular Engineering, University of California, Santa Cruz, CA 95064, USA*Currently at the Flatiron Institute & Princeton University.