Machine Learning-Based Prediction of Cell-type Resolved Brain eQTLs Enhances Discovery of Variants Explaining Alzheimer’s Disease Heritability

Journal: medRxiv
Published Date:

Abstract

The majority of causal genome-wide association studies (GWAS) variants for Alzheimer’s disease (AD) are believed to reside in noncoding regions of the genome, where they likely affect gene regulation, particularly in microglia. Although expression Quantitative Trait Loci (eQTL) studies offer valuable insights into gene regulation, they tend to identify variants in the promoter regions of genes under weaker selection. In contrast, GWAS variants are often found in enhancer regions linked to genes under stronger selection. To address this discrepancy, we developed predictive models, called single-cell Enhanced Expression Modifier Scores (scEEMS), to identify cell type-specific eQTLs using 4,839 genomic features, including deep learning-based scores that predict the effects of variants on various molecular phenotypes. These models were trained on fine-mapped single-cell eQTLs from six cell types and exhibited strong performance, with an average cross-validation area under the precision-recall curve (AUPRC) of 0.67. Notably, for microglia, the predicted eQTLs explained 15.3% of AD GWAS heritability, with a 140.6-fold enrichment of heritability (p-value 5.15 x 10⁻6), compared to just 5% of heritability explained by fine-mapped eQTLs. Incorporating scEEMS predictions as priors in eQTL fine-mapping refined credible sets and resulted in a net gain of 107 eGenes in microglia and 271 eGenes in astrocytes, reflecting improved statistical power that both identified new associations and filtered false positives. We used these models to link variants to cell type-specific genes and then applied the eMAGMA framework to nominate cell type-specific AD risk genes based on GWAS data. Our eMAGMA approach using predicted eQTLs identified 215 cell type-gene pairs (111 unique genes), substantially more than the 76 pairs (55 unique genes) found using fine-mapped eQTLs. Among these, we identified 43 microglia-specific, 18 astrocyte-specific, and 15 oligodendrocyte-specific AD risk genes. Of the 215 pairs, 62 replicated in at least one non-European population (African-American, Hispanic, or East Asian), compared to only 29 from fine-mapped eQTLs. Importantly, 18 of the replicated pairs represent novel discoveries not found by standard MAGMA or fine-mapped eMAGMA analyses. Five cell type-gene pairs replicated across European and at least two non-European populations: BIN1 (microglia), PICALM (microglia), ABCA7 (astrocyte and oligodendrocyte), and ARHGAP45 (microglia).

Authors

  • Chirag M Lakhani; Giacomo Cavalca; Anjing Liu; Rohan Nidumbur; Ru Feng; Towfique Raj; Philip De Jager; Gao Wang; David A. Knowles