Bridging Ancestry Gaps in Genomic Risk Prediction with Tabular Foundation Models

Journal: bioRxiv

Published Date: Jun 2, 2026

Abstract

Motivation: Models deployed for genomic prediction of diseases perform unevenly across populations, limiting clinical utility. Two factors drive this limitation: large imbalances in sample availability across ancestry groups and non-stationarity of genotype-phenotype effect sizes across the ancestry continuum. While tabular foundation models with in-context learning (ICL) have shown strong sample efficiency in other domains, their effectiveness for genotype-to-phenotype prediction and their robustness to ancestry-driven effect heterogeneity remain unclear. Results: Using large, ancestrally diverse biobank data, we show that ICL-capable tabular foundation models reduce performance degradation in under-sampled ancestry groups compared to conventional supervised approaches. However, we find that prevailing models trained on existing synthetic tabular tasks fail when allele effect sizes vary across ancestry space. Treating genetic ancestry as a continuous variable, we introduce an instruction-tuning framework that exposes models to synthetic tasks with ancestry-dependent non-stationary effects. Instruction-tuned models achieve improved and more stable predictive performance across the genetic ancestry continuum, including for individuals distant from in-context exemplars in ancestry space.

Authors

Das
A.; Cui
Y.

External Resources

View on bioRxiv Access via DOI

Bridging Ancestry Gaps in Genomic Risk Prediction with Tabular Foundation Models

Abstract

Authors

Categories

External Resources

Popular Topics

Recent Journals

Bridging Ancestry Gaps in Genomic Risk Prediction with Tabular Foundation Models

Abstract

Authors

Categories

External Resources

Stay Ahead of Medical AI

Popular Topics

Recent Journals