Predicting enhancer-gene links from single-cell multi-omics data by integrating prior Hi-C information
Journal:
bioRxiv
Published Date:
Jan 1, 2025
Abstract
Enhancers play an important role in transcriptional regulation by modulating gene expression from distal genomic locations. Although single-cell ATAC and RNA sequencing (scATAC/RNA-seq) data have been leveraged to infer enhancer-gene links, establishing regulatory links between enhancers and their target genes remains a challenge due to the absence of chromatin conformation information. Here, we present SCEG-HiC, a machine learning method based on weighted graphical lasso, which decodes enhancer-gene links from single-cell multi-omics data by integrating bulk average Hi-C as prior knowledge. Comprehensive evaluation across ten single-cell multi-omics datasets from both humans and mice demonstrates that SCEG-HiC outperforms existing single-cell models, regardless of using paired scATAC/RNA-seq or scATAC-seq data alone. Application of SCEG-HiC to COVID-19 datasets illustrates its capacity to more reliably reconstruct gene regulatory networks underlying disease severity, and elucidate functional associations between non-coding variants and their putative target genes. SCEG-HiC is freely available as an open-source and user-friendly R package, facilitating broad applications in regulatory genomics research.