Machine Learning and Micro Capture-C resolve GWAS associations revealing endothelial stress pathways in Coronary Artery Disease
Journal:
medRxiv
Published Date:
Jan 1, 2025
Abstract
Resolving the gene targets of non-coding genetic variation is the major bottleneck in translating genome wide association studies into mechanistic understanding of complex diseases such as coronary artery disease (CAD). Combining new Transformer-based Machine Learning (ML) approaches trained on cardiovascular epigenetics with high-resolution, allele-specific genomic and transcriptomic technologies we create a highly scalable platform to simultaneously resolve causal variants, cell-type of action, output gene, and direction of effect. When applied to CAD genetics, our ML predicts causal variants from 20,747 candidate SNPs across 9 vessel cell-types and identifies disrupted transcription factor binding motifs using ML feature attributions. We investigate 94 of the top predictions in endothelial cells using Micro Capture-C, revealing the importance of fluid shear stress and TGF-β signaling pathways. We exploit allelic skew in heterozygous cells to demonstrate both variant causality and effect direction, demonstrating this platform can be used to rapidly resolve non-coding genetics in complex disease.