Using Deep Learning to Extrapolate Protein Expression Measurements.

Journal: Proteomics

Published Date: Oct 16, 2020

Abstract

Mass spectrometry (MS)-based quantitative proteomics experiments typically assay a subset of up to 60% of the ≈20 000 human protein coding genes. Computational methods for imputing the missing values using RNA expression data usually allow only for imputations of proteins measured in at least some of the samples. In silico methods for comprehensively estimating abundances across all proteins are still missing. Here, a novel method is proposed using deep learning to extrapolate the observed protein expression values in label-free MS experiments to all proteins, leveraging gene functional annotations and RNA measurements as key predictive attributes. This method is tested on four datasets, including human cell lines and human and mouse tissues. This method predicts the protein expression values with average scores between 0.46 and 0.54, which is significantly better than predictions based on correlations using the RNA expression data alone. Moreover, it is demonstrated that the derived models can be "transferred" across experiments and species. For instance, the model derived from human tissues gave a when applied to mouse tissue data. It is concluded that protein abundances generated in label-free MS experiments can be computationally predicted using functional annotated attributes and can be used to highlight aberrant protein abundance values.

Authors

Mitra Parissa Barzine

European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, CB10 1SD, UK.
Karlis Freivalds

Institute of Mathematics and Computer Science, University of Latvia, Riga, LV1459, Latvia.
James C Wright

Institute of Cancer Research, London, SW3 6JB, UK.
Mārtiņš Opmanis

Institute of Mathematics and Computer Science, University of Latvia, Riga, LV1459, Latvia.
Darta Rituma

Institute of Mathematics and Computer Science, University of Latvia, Riga, LV1459, Latvia.
Fatemeh Zamanzad Ghavidel

Computational Biology Unit, Informatics Department, University of Bergen, Bergen, NO5020, Norway.
Andrew F Jarnuczak

European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, CB10 1SD, UK.
Edgars Celms

Institute of Mathematics and Computer Science, University of Latvia, Riga, LV1459, Latvia.
Kārlis Čerāns

Institute of Mathematics and Computer Science, University of Latvia, Riga, LV1459, Latvia.
Inge Jonassen

Computational Biology Unit, Informatics Department, University of Bergen, Bergen, NO5020, Norway.
Lelde Lace

Institute of Mathematics and Computer Science, University of Latvia, Riga, LV1459, Latvia.
Juan Antonio Vizcaíno

European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom.
Jyoti Sharma Choudhary

Institute of Cancer Research, London, SW3 6JB, UK.
Alvis Brazma

European Bioinformatics Institute, European Molecular Biology Laboratory, Wellcome Genome Campus, Cambridge, UK. brazma@ebi.ac.uk.
Juris Viksna

Institute of Mathematics and Computer Science, University of Latvia, Riga, LV1459, Latvia.

Keywords

Animals Deep Learning Mass Spectrometry Mice Molecular Sequence Annotation Proteins Proteomics

External Resources

View on PubMed Access via DOI PubMed (32937025)

Using Deep Learning to Extrapolate Protein Expression Measurements.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals