Multiple Machine Learning Comparisons of HIV Cell-based and Reverse Transcriptase Data Sets.

Journal: Molecular pharmaceutics
PMID:

Abstract

The human immunodeficiency virus (HIV) causes over a million deaths every year and has a huge economic impact in many countries. The first class of drugs approved were nucleoside reverse transcriptase inhibitors. A newer generation of reverse transcriptase inhibitors have become susceptible to drug resistant strains of HIV, and hence, alternatives are urgently needed. We have recently pioneered the use of Bayesian machine learning to generate models with public data to identify new compounds for testing against different disease targets. The current study has used the NIAID ChemDB HIV, Opportunistic Infection and Tuberculosis Therapeutics Database for machine learning studies. We curated and cleaned data from HIV-1 wild-type cell-based and reverse transcriptase (RT) DNA polymerase inhibition assays. Compounds from this database with ≤1 μM HIV-1 RT DNA polymerase activity inhibition and cell-based HIV-1 inhibition are correlated (Pearson r = 0.44, n = 1137, p < 0.0001). Models were trained using multiple machine learning approaches (Bernoulli Naive Bayes, AdaBoost Decision Tree, Random Forest, support vector classification, k-Nearest Neighbors, and deep neural networks as well as consensus approaches) and then their predictive abilities were compared. Our comparison of different machine learning methods demonstrated that support vector classification, deep learning, and a consensus were generally comparable and not significantly different from each other using 5-fold cross validation and using 24 training and test set combinations. This study demonstrates findings in line with our previous studies for various targets that training and testing with multiple data sets does not demonstrate a significant difference between support vector machine and deep neural networks.

Authors

  • Kimberley M Zorn
    Collaborations Pharmaceuticals, Inc. , 840 Main Campus Drive, Lab 3510 , Raleigh , North Carolina 27606 , United States.
  • Thomas R Lane
    Collaborations Pharmaceuticals, Inc. , Main Campus Drive, Lab 3510 , Raleigh , North Carolina 27606 , United States.
  • Daniel P Russo
    Collaborations Pharmaceuticals, Inc. , 840 Main Campus Drive, Lab 3510, Raleigh, North Carolina 27606, United States.
  • Alex M Clark
    Molecular Materials Informatics, Inc. , 1900 St. Jacques #302, Montreal, Quebec H3J 2S1, Canada.
  • Vadim Makarov
    Bach Institute of Biochemistry , Research Center of Biotechnology of the Russian Academy of Sciences , Leninsky Prospekt 33-2 , Moscow 119071 , Russia.
  • Sean Ekins
    Collaborations in Chemistry, 5616 Hilltop Needmore Road, Fuquay-Varina, NC 27526, USA; Collaborative Drug Discovery, 1633 Bayshore Highway, Suite 342, Burlingame, CA 94010, USA; Collaborations Pharmaceuticals, Inc., 5616 Hilltop Needmore Road, Fuquay-Varina, NC 27526, USA; Phoenix Nest, Inc., P.O. Box 150057, Brooklyn, NY 11215, USA; Hereditary Neuropathy Foundation, 401 Park Avenue South, 10th Floor, New York, NY 10016, USA. Electronic address: ekinssean@yahoo.com.