Multi-Disease Deep Learning Framework for GWAS: Beyond Feature Selection Constraints
Journal:
arXiv
Published Date:
Jul 7, 2025
Abstract
Traditional GWAS has advanced our understanding of complex diseases but often
misses nonlinear genetic interactions. Deep learning offers new opportunities
to capture complex genomic patterns, yet existing methods mostly depend on
feature selection strategies that either constrain analysis to known pathways
or risk data leakage when applied across the full dataset. Further, covariates
can inflate predictive performance without reflecting true genetic signals. We
explore different deep learning architecture choices for GWAS and demonstrate
that careful architectural choices can outperform existing methods under strict
no-leakage conditions. Building on this, we extend our approach to a
multi-label framework that jointly models five diseases, leveraging shared
genetic architecture for improved efficiency and discovery. Applied to five
million SNPs across 37,000 samples, our method achieves competitive predictive
performance (AUC 0.68-0.96), offering a scalable, leakage-free, and
biologically meaningful approach for multi-disease GWAS analysis.