Community Assessment of the Predictability of Cancer Protein and Phosphoprotein Levels from Genomics and Transcriptomics.

Journal: Cell systems
PMID:

Abstract

Cancer is driven by genomic alterations, but the processes causing this disease are largely performed by proteins. However, proteins are harder and more expensive to measure than genes and transcripts. To catalyze developments of methods to infer protein levels from other omics measurements, we leveraged crowdsourcing via the NCI-CPTAC DREAM proteogenomic challenge. We asked for methods to predict protein and phosphorylation levels from genomic and transcriptomic data in cancer patients. The best performance was achieved by an ensemble of models, including as predictors transcript level of the corresponding genes, interaction between genes, conservation across tumor types, and phosphosite proximity for phosphorylation prediction. Proteins from metabolic pathways and complexes were the best and worst predicted, respectively. The performance of even the best-performing model was modest, suggesting that many proteins are strongly regulated through translational control and degradation. Our results set a reference for the limitations of computational inference in proteogenomics. A record of this paper's transparent peer review process is included in the Supplemental Information.

Authors

  • Mi Yang
    Department of Neurology, School of Medicine, The Fourth Affiliated Hospital of Zhejiang University, Yiwu, China.
  • Francesca Petralia
    Department of Genetics and Genomic Sciences and Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.
  • Zhi Li
    Department of Nursing, Zhongshan Hospital of Traditional Chinese Medicine Affiliated to Guangzhou University of Traditional Chinese Medicine, Zhongshan, China.
  • Hongyang Li
    Department of Computational Medicine and Bioinformatics, University of Michigan, 100 Washtenaw Avenue, Ann Arbor, MI, 48109, USA. hyangl@umich.edu.
  • Weiping Ma
    Department of Pharmacology, Qingdao University Medical College, 422 Boya Building, 308 Ningxia Road, Qingdao, Shandong, 266071, China.
  • Xiaoyu Song
    Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Tisch Cancer Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.
  • Sunkyu Kim
    Department of Computer Science and Engineering, Korea University, Seoul 02841, South Korea.
  • Heewon Lee
    Department of Computer Science and Engineering, Korea University, Seongbuk-gu, Seoul, Republic of Korea.
  • Han Yu
    1Joint NTU-UBC Research Centre of Excellence in Active Living for the Elderly (LILY), Nanyang Technological University, Singapore, 639798 Singapore.
  • Bora Lee
    Department of Biochemistry, Chonnam National University Medical School, Hwasun, Republic of Korea.
  • Seohui Bae
    Deargen, Daejeon 34051, Republic of Korea; Department of Biological Science, Department of Bio-Brain Engineering, KAIST, Daejeon, Republic of Korea.
  • Eunji Heo
    Deargen, Daejeon 34051, Republic of Korea; Department of AI, KAIST, Daejeon 34141, Republic of Korea.
  • Jan Kaczmarczyk
    Ardigen, Kraków 30-394, Poland.
  • Piotr Stępniak
    Ardigen, Kraków 30-394, Poland.
  • Michał Warchoł
    Ardigen, Kraków 30-394, Poland.
  • Thomas Yu
    Computational Oncology, Sage Bionetworks, Seattle, Washington.
  • Anna P Calinawan
    Department of Genetics and Genomic Sciences and Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.
  • Paul C Boutros
    Ontario Institute of Cancer Research, Toronto, ON M5G 0A3, Canada; Department of Medical Biophysics, University of Toronto, Toronto, ON M5G 1L7, Canada; Department of Pharmacology and Toxicology, University of Toronto, Toronto, ON M5S 1A8, Canada; Department of Human Genetics, University of California, Los Angeles, CA 90095, USA; Department of Urology, University of California, Los Angeles, CA 90095, USA; Institute for Precision Health, University of California, Los Angeles, CA, USA; Jonsson Comprehensive Cancer Center, University of California, Los Angeles, CA 90095, USA.
  • Samuel H Payne
    Department of Biology, Brigham Young University, Provo, UT 84604, USA.
  • Boris Reva
    Department of Genetics and Genomic Sciences and Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.
  • Emily Boja
    Office of Cancer Clinical Proteomics Research, National Cancer Institute, Bethesda, MD 20892, USA.
  • Henry Rodriguez
    Office of Cancer Clinical Proteomics Research, National Cancer Institute, Bethesda, MD 20892, USA.
  • Gustavo Stolovitzky
    Thomas J. Watson Research Center, IBM, Yorktown Heights, NY, USA.
  • Yuanfang Guan
    Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA. gyuanfan@umich.edu.
  • Jaewoo Kang
    Department of Computer Science and Engineering, Korea University, Seoul, Republic of Korea.
  • Pei Wang
    College of Engineering and Technology, Key Laboratory of Agricultural Equipment for Hilly and Mountain Areas, Southwest University, Chongqing, China.
  • David Fenyö
    Department of Biochemistry and Molecular Pharmacology, Institute for Systems Genetics, New York University Grossman School of Medicine, New York, USA.
  • Julio Saez-Rodriguez
    Institute for Computational Biomedicine, Heidelberg University, Faculty of Medicine, Heidelberg University Hospital, Bioquant, Heidelberg, Germany.