Deep learning on chaos game representation for proteins.

Journal: Bioinformatics (Oxford, England)
PMID:

Abstract

MOTIVATION: Classification of protein sequences is one big task in bioinformatics and has many applications. Different machine learning methods exist and are applied on these problems, such as support vector machines (SVM), random forests (RF) and neural networks (NN). All of these methods have in common that protein sequences have to be made machine-readable and comparable in the first step, for which different encodings exist. These encodings are typically based on physical or chemical properties of the sequence. However, due to the outstanding performance of deep neural networks (DNN) on image recognition, we used frequency matrix chaos game representation (FCGR) for encoding of protein sequences into images. In this study, we compare the performance of SVMs, RFs and DNNs, trained on FCGR encoded protein sequences. While the original chaos game representation (CGR) has been used mainly for genome sequence encoding and classification, we modified it to work also for protein sequences, resulting in n-flakes representation, an image with several icosagons.

Authors

  • Hannah F Löchel
    Department of Mathematics and Computer Science, Philipps-University of Marburg, Marburg 35032, Germany.
  • Dominic Eger
    Department of Mathematics and Computer Science, Philipps-University of Marburg, Marburg 35032, Germany.
  • Theodor Sperlea
    Department of Mathematics and Computer Science, Philipps-University of Marburg, Marburg 35032, Germany.
  • Dominik Heider
    Department of Mathematics and Computer Science, University of Marburg, Marburg, Germany.