scMalignantFinder distinguishes malignant cells in single-cell and spatial transcriptomics by leveraging cancer signatures.
Journal:
Communications biology
PMID:
40148533
Abstract
Single-cell RNA sequencing (scRNA-seq) is a powerful tool for characterizing tumor heterogeneity, yet accurately identifying malignant cells remains challenging. Here, we propose scMalignantFinder, a machine learning tool specifically designed to distinguish malignant cells from their normal counterparts using a data- and knowledge-driven strategy. To develop the tool, multiple cancer datasets were collected, and the initially annotated malignant cells were calibrated using nine carefully curated pan-cancer gene signatures, resulting in over 400,000 single-cell transcriptomes for training. The union of differentially expressed genes across datasets was taken as the features for model construction to comprehensively capture tumor transcriptional diversity. scMalignantFinder outperformed existing automated methods across two gold-standard and eleven patient-derived scRNA-seq datasets. The capability to predict malignancy probability empowers scMalignantFinder to capture dynamic characteristics during tumor progression. Furthermore, scMalignantFinder holds the potential to annotate malignant regions in tumor spatial transcriptomics. Overall, we provide an efficient tool for detecting heterogeneous malignant cell populations.