Comparative Analysis of Feature Selection Methods for Single-Cell RNA Sequencing Data

Journal: bioRxiv
Published Date:

Abstract

Feature selection is a critical preprocessing step in single-cell RNA sequencing (scRNA-seq) analysis, directly impacting downstream clustering and biological interpretation. We systematically compared 16 feature selection methods across three diverse datasets: PBMC3K (immune cells), Visium Heart, and Visium Brain (spatial transcriptomics). Methods included established approaches (Seurat HVG variants, CellRanger HVG), statistical methods (Pearson residuals, variance-based, coefficient of variation), supervised methods (ANOVA F-test, mutual information, random forest), and deep learning techniques (IntegratedGradients, DeepLIFT, GradientShap). We evaluated methods based on execution time, feature overlap, marker recovery, and pathway enrichment. Our analysis revealed substantial variability across methods, with mean pairwise overlap of only 23.7%. However, a core set of 1,150 genes was consistently selected by ≥ 50% of methods. Supervised methods demonstrated superior recovery of known cell type markers, while unsupervised approaches captured broader biological processes. Deep learning methods identified unique gene sets with strong immune pathway enrichment but higher computational cost. Pathway analysis showed all methods successfully identified relevant biological processes, though with varying emphasis. These findings provide practical guidance for method selection based on analytical goals and highlight the value of ensemble approaches in scRNA-seq feature selection.

Authors

  • Adham M. Alkhadrawi; Mohammed A.B. Mahmoud; Mian M.Y. Khalil; Abdullah All Jaber