DeepDom: Predicting protein domain boundary from sequence alone using stacked bidirectional LSTM.

Journal: Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing
Published Date:

Abstract

Protein domain boundary prediction is usually an early step to understand protein function and structure. Most of the current computational domain boundary prediction methods suffer from low accuracy and limitation in handling multi-domain types, or even cannot be applied on certain targets such as proteins with discontinuous domain. We developed an ab-initio protein domain predictor using a stacked bidirectional LSTM model in deep learning. Our model is trained by a large amount of protein sequences without using feature engineering such as sequence profiles. Hence, the predictions using our method is much faster than others, and the trained model can be applied to any type of target proteins without constraint. We evaluated DeepDom by a 10-fold cross validation and also by applying it on targets in different categories from CASP 8 and CASP 9. The comparison with other methods has shown that DeepDom outperforms most of the current ab-initio methods and even achieves better results than the top-level template-based method in certain cases. The code of DeepDom and the test data we used in CASP 8, 9 can be accessed through GitHub at https://github.com/yuexujiang/DeepDom.

Authors

  • Yuexu Jiang
    Department of Electrical Engineering and Computer Science, Bond Life Sciences Center, University of Missouri, Columbia, Missouri 65211, USA.
  • Duolin Wang
    Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, USA.
  • Dong Xu
    Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, USA.