Predicting and interpreting protein and phosphoprotein abundance from pan-cancer and single-cell transcriptomes.

Journal: iScience

Published Date: Jan 27, 2026

Abstract

Proteins that impact phenotype and disease are often approximated by RNA expression, which poorly infers protein abundance. We developed DeepGxP, a deep-learning model trained on The Cancer Genome Atlas pan-cancer data, to predict protein abundance from transcriptome profiles. DeepGxP outperformed conventional models, achieving median Pearson's correlation of 0.68 (n = 187) and predictive performance of 0.74 and 0.64 for proteins with high (≥0.31) and low (<0.31) self-gene/protein correlation, respectively. We also developed DeepEnrich, an integrated gradient-based interpretation framework that identifies predictor genes and enriched functions. For example, predictors of cyclin B1 and E2 are enriched in mitotic chromatid segregation and G2/M transition, respectively. In lung adenocarcinoma, we uncovered distinct EGFR/HER2 phosphorylation patterns in alveolar cells. In breast cancer, p53 protein, but not TP53 mRNA, correlated with survival. DeepGxP also accurately predicted the abundance of single-cell surface proteins, confirming cell identification. Our findings underscore DeepGxP's potential in decoding gene-to-protein relationships for cancer biomarker discovery.

Predicting and interpreting protein and phosphoprotein abundance from pan-cancer and single-cell transcriptomes.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals

Predicting and interpreting protein and phosphoprotein abundance from pan-cancer and single-cell transcriptomes.

Abstract

Authors

Keywords

External Resources

Stay Ahead of Medical AI

Popular Topics

Recent Journals