Deep learning powered single-cell clustering framework with enhanced accuracy and stability.

Journal: Scientific reports
PMID:

Abstract

Single-cell RNA sequencing (scRNA-seq) has revolutionized the field of cellular diversity research. Unsupervised clustering, a key technique in this exploration, allows for the identification of distinct cell types within a population. Graph-based deep clustering methods have shown promise in preserving the structural relationships between cells (nodes) within the data. However, these methods often neglect the inherent distribution of nodes in the graph, leading to incomplete representations of cell populations. Additionally, conventional graph convolutional networks (GCNs) can suffer from oversmoothing, a phenomenon where the network loses the ability to differentiate between samples with similar expression profiles. To address these limitations, we proposed scG-cluster, an innovative deep structural clustering method. This method incorporates two key innovations: (1) Dual-topology adjacency graph: scG-cluster integrates information about node distribution into the traditional adjacency graph used by GCNs. This enriches the graph representation by capturing the spatial relationships between cells in addition to their pairwise similarities. (2) Dual-topology adaptive graph convolutional network (TAGCN): The framework employs a TAGCN architecture with residual concatenation. This network utilizes an attention mechanism to dynamically weight features within the graph, focusing on the most informative aspects for clustering. Additionally, residual connections are implemented to combat oversmoothing, ensuring the network retains the ability to distinguish between subtle differences in cell expression profiles. Furthermore, scG-cluster iteratively refines the clustering centers, leading to enhanced stability and accuracy in the final cluster assignments. Extensive evaluations on six diverse scRNA-seq datasets demonstrate that scG-cluster consistently outperforms existing state-of-the-art methods in terms of both clustering accuracy and scalability. Ablation studies are also conducted to validate the significant contributions of both the residual connections and the attention mechanism to the overall performance of the model. The source code for scG-cluster is publicly available at https://github.com/xixi-wq/scG-cluster .

Authors

  • Yi Zhang
    Department of Thyroid Surgery, China-Japan Union Hospital of Jilin University, Jilin University, Changchun, China.
  • Xi Feng
    Department of Pathology, Hubei Cancer Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei, China.
  • Yin Wang
    State Key Laboratory of ASIC and System, School of Microelectronics, Fudan University, Shanghai 200433, China.
  • Kai Shi
    Department of Anesthesiology, National Regional Medical Center, Binhai Campus of the First Affiliated Hospital, Fujian Medical University, Fuzhou, 350212, People's Republic of China.