Machine Learning and Micro Capture-C resolve GWAS associations revealing endothelial stress pathways in Coronary Artery Disease

Journal: medRxiv
Published Date:

Abstract

Resolving the gene targets of non-coding genetic variation is the major bottleneck in translating genome wide association studies into mechanistic understanding of complex diseases such as coronary artery disease (CAD). Combining new Transformer-based Machine Learning (ML) approaches trained on cardiovascular epigenetics with high-resolution, allele-specific genomic and transcriptomic technologies we create a highly scalable platform to simultaneously resolve causal variants, cell-type of action, output gene, and direction of effect. When applied to CAD genetics, our ML predicts causal variants from 20,747 candidate SNPs across 9 vessel cell-types and identifies disrupted transcription factor binding motifs using ML feature attributions. We investigate 94 of the top predictions in endothelial cells using Micro Capture-C, revealing the importance of fluid shear stress and TGF-β signaling pathways. We exploit allelic skew in heterozygous cells to demonstrate both variant causality and effect direction, demonstrating this platform can be used to rapidly resolve non-coding genetics in complex disease.

Authors

  • Matthew Baxter; Edward Sanders; Simone G Riva; Joseph C Hamley; E Ravza Gur; James L T Dalgleish; Isabella M Freund; Nigel Roberts; Gabrielle Raymond; Martin Sergeant; Damien J Downes; Hangpeng Li; David G McVey; Shu Ye; Christopher Grace; Lance D Hentges; Gwendal Dujardin; Tom R Webb; James O J Davies; Theodosios Kyriakou; Anuj Goel; Hugh Watkins; Jim R Hughes