How Molecules Impact Cells: Unlocking Contrastive PhenoMolecular Retrieval
Journal:
arXiv
Published Date:
Sep 10, 2024
Abstract
Predicting molecular impact on cellular function is a core challenge in
therapeutic design. Phenomic experiments, designed to capture cellular
morphology, utilize microscopy based techniques and demonstrate a high
throughput solution for uncovering molecular impact on the cell. In this work,
we learn a joint latent space between molecular structures and microscopy
phenomic experiments, aligning paired samples with contrastive learning.
Specifically, we study the problem ofContrastive PhenoMolecular Retrieval,
which consists of zero-shot molecular structure identification conditioned on
phenomic experiments. We assess challenges in multi-modal learning of phenomics
and molecular modalities such as experimental batch effect, inactive molecule
perturbations, and encoding perturbation concentration. We demonstrate improved
multi-modal learner retrieval through (1) a uni-modal pre-trained phenomics
model, (2) a novel inter sample similarity aware loss, and (3) models
conditioned on a representation of molecular concentration. Following this
recipe, we propose MolPhenix, a molecular phenomics model. MolPhenix leverages
a pre-trained phenomics model to demonstrate significant performance gains
across perturbation concentrations, molecular scaffolds, and activity
thresholds. In particular, we demonstrate an 8.1x improvement in zero shot
molecular retrieval of active molecules over the previous state-of-the-art,
reaching 77.33% in top-1% accuracy. These results open the door for machine
learning to be applied in virtual phenomics screening, which can significantly
benefit drug discovery applications.