TMLRpred: A machine learning classification model to distinguish reversible EGFR double mutant inhibitors.

Journal: Chemical biology & drug design
Published Date:

Abstract

The EGFR is a clinically important therapeutic drug target in lung cancer. The first-generation tyrosine kinase inhibitors used in clinics are effective against L858R-mutated EGFR. However, relapse of the disease due to the presence of resistant mutation (T790M) makes these inhibitors ineffective. This has necessitated the need to identify new potent EGFR inhibitors against the resistant double mutants. Therefore, various machine learning techniques ((instance-based learner (IBK), naïve Bayesian (NB), sequential minimal optimization (SMO), and random forest (RF)) were employed to develop twelve classification models on three different datasets (high, moderate, and weakly active inhibitors). The models were validated using fivefold cross-validation and independent validation datasets. It was observed that the random forest-based models showed best performance. Also, functional groups, PubChem fingerprints, and substructure of highly active inhibitors were compared to inactive to identify structural features which are important for activity. To promote open-source drug discovery, a tool has been developed, which incorporates the best performing models and allows users to predict the potential of chemical molecules as anti-TMLR inhibitor. It is expected that the machine learning classification models developed in this study will pave way for identifying novel inhibitors against the resistant EGFR double mutants.

Authors

  • Ravi Saini
    School of Biochemical Engineering, Indian Institute of Technology (BHU), Varanasi, India.
  • Shehnaz Fatima
    Bioinformatics Division, ICMR-National Institute of Cancer Prevention and Research, Noida, India.
  • Subhash Mohan Agarwal
    Bioinformatics Division, ICMR-National Institute of Cancer Prevention and Research, Noida, India.