DeepCrystal: a deep learning framework for sequence-based protein crystallization prediction.

Journal: Bioinformatics (Oxford, England)
Published Date:

Abstract

MOTIVATION: Protein structure determination has primarily been performed using X-ray crystallography. To overcome the expensive cost, high attrition rate and series of trial-and-error settings, many in-silico methods have been developed to predict crystallization propensities of proteins based on their sequences. However, the majority of these methods build their predictors by extracting features from protein sequences, which is computationally expensive and can explode the feature space. We propose DeepCrystal, a deep learning framework for sequence-based protein crystallization prediction. It uses deep learning to identify proteins which can produce diffraction-quality crystals without the need to manually engineer additional biochemical and structural features from sequence. Our model is based on convolutional neural networks, which can exploit frequently occurring k-mers and sets of k-mers from the protein sequences to distinguish proteins that will result in diffraction-quality crystals from those that will not.

Authors

  • Abdurrahman Elbasir
    College of Science and Engineering, Hamad Bin Khalifa University, Doha, Qatar.
  • Balasubramanian Moovarkumudalvan
    Qatar Biomedical Research Institute and Hamad Bin Khalifa University, Doha, Qatar.
  • Khalid Kunji
    Qatar Computing Research Institute, Hamad Bin Khalifa University, Doha, Qatar.
  • Prasanna R Kolatkar
    Qatar Biomedical Research Institute and Hamad Bin Khalifa University, Doha, Qatar.
  • Raghvendra Mall
    Qatar Computing Research Institute, Hamad Bin Khalifa University, Doha, Qatar.
  • Halima Bensmail
    Qatar Computing Research Institute, Hamad Bin Khalifa University, Doha, Qatar.