Accurate Prediction of Human Essential Proteins Using Ensemble Deep Learning.

Journal: IEEE/ACM transactions on computational biology and bioinformatics
Published Date:

Abstract

Essential proteins are considered the foundation of life as they are indispensable for the survival of living organisms. Computational methods for essential protein discovery provide a fast way to identify essential proteins. But most of them heavily rely on various biological information, especially protein-protein interaction networks, which limits their practical applications. With the rapid development of high-throughput sequencing technology, sequencing data has become the most accessible biological data. However, using only protein sequence information to predict essential proteins has limited accuracy. In this paper, we propose EP-EDL, an ensemble deep learning model using only protein sequence information to predict human essential proteins. EP-EDL integrates multiple classifiers to alleviate the class imbalance problem and to improve prediction accuracy and robustness. In each base classifier, we employ multi-scale text convolutional neural networks to extract useful features from protein sequence feature matrices with evolutionary information. Our computational results show that EP-EDL outperforms the state-of-the-art sequence-based methods. Furthermore, EP-EDL provides a more practical and flexible way for biologists to accurately predict essential proteins. The source code and datasets can be downloaded from https://github.com/CSUBioGroup/EP-EDL.

Authors

  • Yiming Li
    Department of Cardiology, West China Hospital, Sichuan University, Chengdu 610041, China.
  • Min Zeng
    Nephrology Department, Affiliated Hospital of Southern Medical University: Shenzhen Longhua New District People's Hospital, Shenzhen, China.
  • Yifan Wu
    Department of Information Science and Technology, Northwest University, Xi'an, Shaanxi 710127, China.
  • Yaohang Li
  • Min Li
    Hubei Provincial Institute for Food Supervision and Test, Hubei Provincial Engineering and Technology Research Center for Food Quality and Safety Test, Wuhan 430075, China.