Sequence representation approaches for sequence-based protein prediction tasks that use deep learning.

Journal: Briefings in functional genomics

Published Date: Mar 2, 2021

Abstract

Deep learning has been increasingly used in bioinformatics, especially in sequence-based protein prediction tasks, as large amounts of biological data are available and deep learning techniques have been developed rapidly in recent years. For sequence-based protein prediction tasks, the selection of a suitable model architecture is essential, whereas sequence data representation is a major factor in controlling model performance. Here, we summarized all the main approaches that are used to represent protein sequence data (amino acid sequence encoding or embedding), which include end-to-end embedding methods, non-contextual embedding methods and embedding methods that use transfer learning and others that are applied for some specific tasks (such as protein sequence embedding based on extracted features for protein structure predictions and graph convolutional network-based embedding for drug discovery tasks). We have also reviewed the architectures of various types of embedding models theoretically and the development of these types of sequence embedding approaches to facilitate researchers and users in selecting the model that best suits their requirements.

Authors

Feifei Cui

School of Computer Science and Technology, Hainan University, Haikou 570228, China.
Zilong Zhang

School of Computer Science and Technology, Hainan University, Haikou 570228, China.
Quan Zou

Keywords

Amino Acid Sequence Computational Biology Deep Learning Neural Networks, Computer Proteins

External Resources

View on PubMed Access via DOI PubMed (33527980)

Sequence representation approaches for sequence-based protein prediction tasks that use deep learning.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals