Benchmarking Chemical, Genetic, and Cell Line Encodings for Cancer Perturbation Response Prediction
Journal:
bioRxiv
Published Date:
Jan 1, 2025
Abstract
Estimating the response of tumor cells to specific perturbations is crucial for identifying effective treatments that selectively target cancer cells while sparing healthy ones, enabling personalized medicine approaches. Large-scale initiatives, such as DepMap, have profiled cancer cell line responses to various drug treatments and gene knockouts, facilitating the development of computational models that predict sensitivity of cancer cells to different perturbations. Existing models utilize diverse methods for encoding perturbations, including various chemical fingerprints and types of gene-gene relationships. They also rely on different architectures and are often trained on distinct datasets. This variability makes it unclear which chemical, genetic, or cell line encoding is most informative for predicting cancer cell viability following perturbation treatment. To address this gap, we systematically evaluated various approaches to encode chemical and genetic perturbations and cell lines on the tasks of predicting cell viability and gene dependency. We found that for genetic perturbations, STRING-based encodings yield the highest performance, considerably outperforming GO-term and protein language model based encodings, which showed promising results in previous perturbation prediction studies. For chemical perturbations, while most encoders showed comparable performance, those pre-trained on other bio-assay data yielded the highest performance. Finally, we found that for cell line encodings, raw gene expression features outperformed more sophis-ticated approaches, such as transcriptomics foundation model embeddings, as well as genotype-based encodings. Together, our results identify promising approaches for encoding chemical and genetic perturbations and enable virtual screening for perturbations with selective toxicity.