Representation learning of single-cell RNA-seq data.
Journal:
RNA (New York, N.Y.)
Published Date:
Mar 16, 2026
Abstract
Single-cell RNA sequencing (scRNA-seq) has become a cornerstone experimental technique in tissue biology, with gene expression data for over 100 million cells available in public repositories. The high dimensionality, sparsity, and technical noise inherent to scRNA-seq data have motivated the development of a broad spectrum of representation learning approaches. These methods learn compressed, lower-dimensional representations of single-cell transcriptomes that are meant to preserve essential variation while reducing noise, and can be used for clustering, visualization, trajectory inference, and other downstream tasks. Furthermore, methods have emerged that aim to integrate data from multiple experiments by learning a common latent representation. In this review, we frame factor models, autoencoders, contrastive learning approaches, and transformer-based foundation models as distinct instances of the representation learning paradigm for scRNA-seq. We provide a coherent taxonomy of these methods that articulates their conceptual foundations, shared assumptions, and key distinctions. We also discuss benchmarking and identify major challenges and open questions that will shape the future of the field.
Authors
Keywords
No keywords available for this article.