DeepPI: Alignment-Free Analysis of Flexible Length Proteins Based on Deep Learning and Image Generator.

Journal: Interdisciplinary sciences, computational life sciences
Published Date:

Abstract

With the rapid development of NGS technology, the number of protein sequences has increased exponentially. Computational methods have been introduced in protein functional studies because the analysis of large numbers of proteins through biological experiments is costly and time-consuming. In recent years, new approaches based on deep learning have been proposed to overcome the limitations of conventional methods. Although deep learning-based methods effectively utilize features of protein function, they are limited to sequences of fixed-length and consider information from adjacent amino acids. Therefore, new protein analysis tools that extract functional features from proteins of flexible length and train models are required. We introduce DeepPI, a deep learning-based tool for analyzing proteins in large-scale database. The proposed model that utilizes Global Average Pooling is applied to proteins of flexible length and leads to reduced information loss compared to existing algorithms that use fixed sizes. The image generator converts a one-dimensional sequence into a distinct two-dimensional structure, which can extract common parts of various shapes. Finally, filtering techniques automatically detect representative data from the entire database and ensure coverage of large protein databases. We demonstrate that DeepPI has been successfully applied to large databases such as the Pfam-A database. Comparative experiments on four types of image generators illustrated the impact of structure on feature extraction. The filtering performance was verified by varying the parameter values and proved to be applicable to large databases. Compared to existing methods, DeepPI outperforms in family classification accuracy for protein function inference.

Authors

  • Mingeun Ji
    Department of Multimedia Engineering, Dongguk University, Seoul, 04620, Korea.
  • Yejin Kan
    Department of Multimedia Engineering, Dongguk University, Seoul, 04620, Korea.
  • Dongyeon Kim
    Department of Chemical & Biomolecular Engineering, Sogang University, 35 Baekbeom-ro, Mapo-gu, Seoul 04107, Republic of Korea.
  • Seungmin Lee
    Department of Robotics Engineering, DGIST-ETH Microrobot Research Center, Daegu Gyeongbuk Institute of Science and Technology (DGIST), 333 Techno Jungang-daero, Hyeonpung-Myeon, Dalseong-Gun, Daegu, 42988, Republic of Korea.
  • Gangman Yi
    Department of Multimedia Engineering, Dongguk University, Seoul, 04620, Korea. gangman@dongguk.edu.