Iterative clustering algorithm G-DESC-E and pan-cancer key gene analysis based on single-cell sequencing data.
Journal:
Briefings in bioinformatics
Published Date:
Jul 2, 2025
Abstract
Single-cell sequencing technology has profoundly revolutionized the field of cancer genomics, enabling researchers to explore gene expression profiles at the resolution of individual cells. Despite its extensive applications in the study of cancer gene states, pan-cancer analyses remain relatively underexplored. In this study, we propose the G-DESC-E algorithm, which effectively distinguishes dimensionality-reduced data through a grid-based approach, filters out outliers during the preprocessing phase, and employs the Louvain algorithm for prescreening cluster centroids as initial clusters. We construct an objective function by integrating label entropy with the Kullback-Leibler divergence formula, achieving final clustering results through iterative optimization. Our findings demonstrate the effectiveness of the G-DESC-E algorithm in enhancing clustering accuracy. By applying our methodology to real-world datasets, we illustrate its capability to identify critical transcriptional features associated with distinct cancer subtypes. Coupled with clustering visualization and gene ontology analysis, we identify over thirty genes potentially related to cancer occurrence and progression. The algorithm and research framework presented in this study pave the way for new directions in clinical research by applying single-cell sequencing technology to the analysis of key genes within the realm of pan-cancer analysis for the first time. This approach offers valuable insights that can inform further clinical investigations.