Predicting enhancer-gene links from single-cell multi-omics data by integrating prior Hi-C information

Journal: bioRxiv
Published Date:

Abstract

Enhancers play an important role in transcriptional regulation by modulating gene expression from distal genomic locations. Although single-cell ATAC and RNA sequencing (scATAC/RNA-seq) data have been leveraged to infer enhancer-gene links, establishing regulatory links between enhancers and their target genes remains a challenge due to the absence of chromatin conformation information. Here, we present SCEG-HiC, a machine learning method based on weighted graphical lasso, which decodes enhancer-gene links from single-cell multi-omics data by integrating bulk average Hi-C as prior knowledge. Comprehensive evaluation across ten single-cell multi-omics datasets from both humans and mice demonstrates that SCEG-HiC outperforms existing single-cell models, regardless of using paired scATAC/RNA-seq or scATAC-seq data alone. Application of SCEG-HiC to COVID-19 datasets illustrates its capacity to more reliably reconstruct gene regulatory networks underlying disease severity, and elucidate functional associations between non-coding variants and their putative target genes. SCEG-HiC is freely available as an open-source and user-friendly R package, facilitating broad applications in regulatory genomics research.

Authors

  • Xuan Liang; Yuanyuan Miao; Dongmei Han; Yurun Li; Wenwen Zhang; Zhen Wang