DeepCDA: deep cross-domain compound-protein affinity prediction through LSTM and convolutional neural networks.

Journal: Bioinformatics (Oxford, England)
Published Date:

Abstract

MOTIVATION: An essential part of drug discovery is the accurate prediction of the binding affinity of new compound-protein pairs. Most of the standard computational methods assume that compounds or proteins of the test data are observed during the training phase. However, in real-world situations, the test and training data are sampled from different domains with different distributions. To cope with this challenge, we propose a deep learning-based approach that consists of three steps. In the first step, the training encoder network learns a novel representation of compounds and proteins. To this end, we combine convolutional layers and long-short-term memory layers so that the occurrence patterns of local substructures through a protein and a compound sequence are learned. Also, to encode the interaction strength of the protein and compound substructures, we propose a two-sided attention mechanism. In the second phase, to deal with the different distributions of the training and test domains, a feature encoder network is learned for the test domain by utilizing an adversarial domain adaptation approach. In the third phase, the learned test encoder network is applied to new compound-protein pairs to predict their binding affinity.

Authors

  • Karim Abbasi
    Laboratory of Systems Biology and Bioinformatics (LBB), Institute of Biochemistry and Biophysics, University of Tehran, Tehran 1417614411, Iran.
  • Parvin Razzaghi
    Department of Computer Science and Information Technology, Institute for Advanced Studies in Basic Sciences (IASBS), Zanjan 4513766731, Iran.
  • Antti Poso
    Department of Internal Medicine VIII, University Hospital of Tübingen , Tübingen, Germany.
  • Massoud Amanlou
    Department of Medicinal Chemistry, School of Pharmacy, Drug Design and Development Research Center, Tehran University of Medical Sciences, Tehran, Iran.
  • Jahan B Ghasemi
    Drug Design in Silico Lab., Chemistry Faculty, University of Tehran, Tehran, Iran.
  • Ali Masoudi-Nejad
    Laboratory of system Biology and Bioinformatics, Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran. amasoudin@ut.ac.ir.