SCIG: Machine learning uncovers cell identity genes in single cells by genetic sequence codes.

Journal: Nucleic acids research
Published Date:

Abstract

Deciphering cell identity genes is pivotal to understanding cell differentiation, development, and cell identity dysregulation involving diseases. Here, we introduce SCIG, a machine-learning method to uncover cell identity genes in single cells. In alignment with recent reports that cell identity genes (CIGs) are regulated with unique epigenetic signatures, we found CIGs exhibit distinctive genetic sequence signatures, e.g. unique enrichment patterns of cis-regulatory elements. Using these genetic sequence signatures, along with gene expression information from single-cell RNA-seq data, SCIG uncovers the identity genes of a cell without a need for comparison to other cells. CIG score defined by SCIG surpassed expression value in network analysis to reveal the master transcription factors (TFs) regulating cell identity. Applying SCIG to the human endothelial cell atlas revealed that the tissue microenvironment is a critical supplement to master TFs for cell identity refinement. SCIG is publicly available at https://doi.org/10.5281/zenodo.14726426  , offering a valuable tool for advancing cell differentiation, development, and regenerative medicine research.

Authors

  • Kulandaisamy Arulsamy
    Basic and Translational Research Division, Department of Cardiology, Boston Children's Hospital, Boston, MA 02115, United States.
  • Bo Xia
    Center for Bioinformatics and Computational Biology, Houston Methodist Research Institute, Houston, TX, USA.
  • Yang Yu
    Division of Cardiology, the Central Hospital of Wuhan, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China.
  • Hong Chen
    Department of Nephrology, The Second Affiliated Hospital of Guangxi Medical University, Nanning, Guangxi, China.
  • William T Pu
    Department of Cardiology, Boston Children's Hospital, Boston, MA, USA. william.pu@cardio.chboston.org.
  • Lili Zhang
    Pharmaceutics Department, Institute of Medicinal Biotechnology, Chinese Academy of Medical Science and Peking Union Medical College, Beijing, 100050, PR China.
  • Kaifu Chen
    Center for Bioinformatics and Computational Biology, Houston Methodist Research Institute, Houston, TX, USA. kchen2@houstonmethodist.org.