Predicting and interpreting protein and phosphoprotein abundance from pan-cancer and single-cell transcriptomes.
Journal:
iScience
Published Date:
Jan 27, 2026
Abstract
Proteins that impact phenotype and disease are often approximated by RNA expression, which poorly infers protein abundance. We developed DeepGxP, a deep-learning model trained on The Cancer Genome Atlas pan-cancer data, to predict protein abundance from transcriptome profiles. DeepGxP outperformed conventional models, achieving median Pearson's correlation of 0.68 (n = 187) and predictive performance of 0.74 and 0.64 for proteins with high (≥0.31) and low (<0.31) self-gene/protein correlation, respectively. We also developed DeepEnrich, an integrated gradient-based interpretation framework that identifies predictor genes and enriched functions. For example, predictors of cyclin B1 and E2 are enriched in mitotic chromatid segregation and G2/M transition, respectively. In lung adenocarcinoma, we uncovered distinct EGFR/HER2 phosphorylation patterns in alveolar cells. In breast cancer, p53 protein, but not TP53 mRNA, correlated with survival. DeepGxP also accurately predicted the abundance of single-cell surface proteins, confirming cell identification. Our findings underscore DeepGxP's potential in decoding gene-to-protein relationships for cancer biomarker discovery.
Authors
Keywords
No keywords available for this article.