Using Deep Learning to Extrapolate Protein Expression Measurements.

Journal: Proteomics
Published Date:

Abstract

Mass spectrometry (MS)-based quantitative proteomics experiments typically assay a subset of up to 60% of the ≈20 000 human protein coding genes. Computational methods for imputing the missing values using RNA expression data usually allow only for imputations of proteins measured in at least some of the samples. In silico methods for comprehensively estimating abundances across all proteins are still missing. Here, a novel method is proposed using deep learning to extrapolate the observed protein expression values in label-free MS experiments to all proteins, leveraging gene functional annotations and RNA measurements as key predictive attributes. This method is tested on four datasets, including human cell lines and human and mouse tissues. This method predicts the protein expression values with average scores between 0.46 and 0.54, which is significantly better than predictions based on correlations using the RNA expression data alone. Moreover, it is demonstrated that the derived models can be "transferred" across experiments and species. For instance, the model derived from human tissues gave a when applied to mouse tissue data. It is concluded that protein abundances generated in label-free MS experiments can be computationally predicted using functional annotated attributes and can be used to highlight aberrant protein abundance values.

Authors

  • Mitra Parissa Barzine
    European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, CB10 1SD, UK.
  • Karlis Freivalds
    Institute of Mathematics and Computer Science, University of Latvia, Riga, LV1459, Latvia.
  • James C Wright
    Institute of Cancer Research, London, SW3 6JB, UK.
  • Mārtiņš Opmanis
    Institute of Mathematics and Computer Science, University of Latvia, Riga, LV1459, Latvia.
  • Darta Rituma
    Institute of Mathematics and Computer Science, University of Latvia, Riga, LV1459, Latvia.
  • Fatemeh Zamanzad Ghavidel
    Computational Biology Unit, Informatics Department, University of Bergen, Bergen, NO5020, Norway.
  • Andrew F Jarnuczak
    European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, CB10 1SD, UK.
  • Edgars Celms
    Institute of Mathematics and Computer Science, University of Latvia, Riga, LV1459, Latvia.
  • Kārlis Čerāns
    Institute of Mathematics and Computer Science, University of Latvia, Riga, LV1459, Latvia.
  • Inge Jonassen
    Computational Biology Unit, Informatics Department, University of Bergen, Bergen, NO5020, Norway.
  • Lelde Lace
    Institute of Mathematics and Computer Science, University of Latvia, Riga, LV1459, Latvia.
  • Juan Antonio Vizcaíno
    European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom.
  • Jyoti Sharma Choudhary
    Institute of Cancer Research, London, SW3 6JB, UK.
  • Alvis Brazma
    European Bioinformatics Institute, European Molecular Biology Laboratory, Wellcome Genome Campus, Cambridge, UK. brazma@ebi.ac.uk.
  • Juris Viksna
    Institute of Mathematics and Computer Science, University of Latvia, Riga, LV1459, Latvia.