SCIG: Machine learning uncovers cell identity genes in single cells by genetic sequence codes.
Journal:
Nucleic acids research
Published Date:
May 22, 2025
Abstract
Deciphering cell identity genes is pivotal to understanding cell differentiation, development, and cell identity dysregulation involving diseases. Here, we introduce SCIG, a machine-learning method to uncover cell identity genes in single cells. In alignment with recent reports that cell identity genes (CIGs) are regulated with unique epigenetic signatures, we found CIGs exhibit distinctive genetic sequence signatures, e.g. unique enrichment patterns of cis-regulatory elements. Using these genetic sequence signatures, along with gene expression information from single-cell RNA-seq data, SCIG uncovers the identity genes of a cell without a need for comparison to other cells. CIG score defined by SCIG surpassed expression value in network analysis to reveal the master transcription factors (TFs) regulating cell identity. Applying SCIG to the human endothelial cell atlas revealed that the tissue microenvironment is a critical supplement to master TFs for cell identity refinement. SCIG is publicly available at https://doi.org/10.5281/zenodo.14726426 , offering a valuable tool for advancing cell differentiation, development, and regenerative medicine research.