scMamba: A Pre-Trained Model for Single-Nucleus RNA Sequencing Analysis in Neurodegenerative Disorders
Journal:
arXiv
Published Date:
Feb 12, 2025
Abstract
Single-nucleus RNA sequencing (snRNA-seq) has significantly advanced our
understanding of the disease etiology of neurodegenerative disorders. However,
the low quality of specimens derived from postmortem brain tissues, combined
with the high variability caused by disease heterogeneity, makes it challenging
to integrate snRNA-seq data from multiple sources for precise analyses. To
address these challenges, we present scMamba, a pre-trained model designed to
improve the quality and utility of snRNA-seq analysis, with a particular focus
on neurodegenerative diseases. Inspired by the recent Mamba model, scMamba
introduces a novel architecture that incorporates a linear adapter layer, gene
embeddings, and bidirectional Mamba blocks, enabling efficient processing of
snRNA-seq data while preserving information from the raw input. Notably,
scMamba learns generalizable features of cells and genes through pre-training
on snRNA-seq data, without relying on dimension reduction or selection of
highly variable genes. We demonstrate that scMamba outperforms benchmark
methods in various downstream tasks, including cell type annotation, doublet
detection, imputation, and the identification of differentially expressed
genes.