Modeling Gene Expression Distributional Shifts for Unseen Genetic Perturbations
Journal:
arXiv
Published Date:
Jul 1, 2025
Abstract
We train a neural network to predict distributional responses in gene
expression following genetic perturbations. This is an essential task in
early-stage drug discovery, where such responses can offer insights into gene
function and inform target identification. Existing methods only predict
changes in the mean expression, overlooking stochasticity inherent in
single-cell data. In contrast, we offer a more realistic view of cellular
responses by modeling expression distributions. Our model predicts gene-level
histograms conditioned on perturbations and outperforms baselines in capturing
higher-order statistics, such as variance, skewness, and kurtosis, at a
fraction of the training cost. To generalize to unseen perturbations, we
incorporate prior knowledge via gene embeddings from large language models
(LLMs). While modeling a richer output space, the method remains competitive in
predicting mean expression changes. This work offers a practical step towards
more expressive and biologically informative models of perturbation effects.