CSI-GEP: A GPU-based unsupervised machine learning approach for recovering gene expression programs in atlas-scale single-cell RNA-seq data.

Journal: Cell genomics
PMID:

Abstract

Exploratory analysis of single-cell RNA sequencing (scRNA-seq) typically relies on hard clustering over two-dimensional projections like uniform manifold approximation and projection (UMAP). However, such methods can severely distort the data and have many arbitrary parameter choices. Methods that can model scRNA-seq data as non-discrete "gene expression programs" (GEPs) can better preserve the data's structure, but currently, they are often not scalable, not consistent across repeated runs, and lack an established method for choosing key parameters. Here, we developed a GPU-based unsupervised learning approach, "consensus and scalable inference of gene expression programs" (CSI-GEP). We show that CSI-GEP can recover ground truth GEPs in real and simulated atlas-scale scRNA-seq datasets, significantly outperforming cutting-edge methods, including GPT-based neural networks. We applied CSI-GEP to a whole mouse brain atlas of 2.2 million cells, disentangling endothelial cell types missed by other methods, and to an integrated scRNA-seq atlas of human tumors and cell lines, discovering mesenchymal-like GEPs unique to cancer cells growing in culture.

Authors

  • Xueying Liu
    Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN 38105, USA.
  • Richard H Chapple
    Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN 38105, USA.
  • Declan Bennett
    Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN 38105, USA.
  • William C Wright
    Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN 38105, USA.
  • Ankita Sanjali
    Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN 38105, USA.
  • Erielle Culp
    Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN 38105, USA; Department of Genetics, Genomics, and Informatics, University of Tennessee Health Science Center, Memphis, TN 38163, USA.
  • Yinwen Zhang
    Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN 38105, USA.
  • Min Pan
    the School of Biological Science and Medical Engineering, Southeast University, Nanjing 210096, China, and.
  • Paul Geeleher
    Section of Hematology/Oncology, Department of Medicine, University of Chicago, Chicago, IL-60637, USA. paul.geeleher@gmail.com.