Predicting gene and protein expression levels from DNA and protein sequences with Perceiver.

Journal: Computer methods and programs in biomedicine
Published Date:

Abstract

BACKGROUND AND OBJECTIVE: The functions of an organism and its biological processes result from the expression of genes and proteins. Therefore quantifying and predicting mRNA and protein levels is a crucial aspect of scientific research. Concerning the prediction of mRNA levels, the available approaches use the sequence upstream and downstream of the Transcription Start Site (TSS) as input to neural networks. The State-of-the-art models (e.g., Xpresso and Basenjii) predict mRNA levels exploiting Convolutional (CNN) or Long Short Term Memory (LSTM) Networks. However, CNN prediction depends on convolutional kernel size, and LSTM suffers from capturing long-range dependencies in the sequence. Concerning the prediction of protein levels, as far as we know, there is no model for predicting protein levels by exploiting the gene or protein sequences.

Authors

  • Matteo Stefanini
  • Marta Lovino
    Politecnico di Torino, Department of Control and Computer Engineering, Corso Duca Degli Abruzzi 24, 10129 Torino, Italy. marta.lovino@polito.it.
  • Rita Cucchiara
  • Elisa Ficarra
    Politecnico di Torino, Department of Control and Computer Engineering, Corso Duca Degli Abruzzi 24, 10129 Torino, Italy. elisa.ficarra@polito.it.