Phenotypic reversion and target prioritization for cellular inflammation via representation learning with foundation models
Journal:
bioRxiv
Published Date:
Mar 6, 2026
Abstract
The identification of genetic perturbations that can reverse disease-associated cellular phenotypes toward a healthy state is a central challenge in early drug discovery. We present a proof-of-concept framework leveraging single-cell foundation models (scFMs) and a large-scale Perturb-seq dataset to prioritize targets for phenotypic reversion of cellular inflammation. We incorporated both basal and proinflammatory signaling conditions, specifically stimulation with interleukin-1 beta (IL-1{beta}) and tumor necrosis factor alpha (TNF-), to assess whether atherosclerotic disease-relevant stimulation improved identification of genes and pathways critical to disease progression. Our dataset comprised 864,115 endothelial cells subjected to 1,740 unique genetic perturbations. Through having both conditions, we identified targets that exhibited differential effects on gene expression dependent on cellular state. Using scFMs, we embedded single-cell transcriptomes into high-dimensional latent spaces and ranked perturbations by their ability to shift the inflammatory transcriptomic profile toward that of untreated controls. Benchmarking against both annotated gene sets and expert-curated targets, we found robust enrichment for known regulators of inflammation and biologically relevant targets, despite these models having no prior information about these targets. Importantly, including both basal and proinflammatory conditions improved identification of inflammation-associated targets compared to using just the basal condition. This underscored the value of incorporating disease-relevant stimulations in perturbation experiments. Our results highlight the utility of scFMs for data-driven target nomination, emphasize the role of cellular state in regulatory responses, and provide a scalable, model-agnostic approach for ML-guided target discovery. This work offers a valuable community resource for advancing biological understanding of inflammatory-associated disease.