Advancing promiscuous aggregating inhibitor analysis with intelligent machine learning classification.

Journal: Briefings in bioinformatics
PMID:

Abstract

Small molecules have been playing a crucial role in drug discovery; however, some exhibit nonspecific inhibitory effects during hit screening due to the formation of colloidal aggregators. Such false positives often lead to significant research costs and time investment. Therefore, to identify potential aggregating compounds efficiently and accurately at an early stage of drug discovery, we employed several machine learning techniques to develop classification models for identifying promiscuous aggregating inhibitors. Using a training dataset of 10 000 aggregators and 10 000 nonaggregators, models were trained by combining four different molecular representations with various machine learning algorithms. We found that the best-performing model is the one that employs path-based FP2 fingerprints in conjunction with the cubic support vector machine algorithm, which achieved the highest accuracy and area under the receiver operating characteristic curve values for both the validation and test datasets while maintaining high sensitivity and specificity levels (>0.93). Additionally, we have proposed a new model interpretation method, global sensitivity analysis (GSA), to complement the well-recognized SHapley Additive exPlanations analysis. Several comparative studies have shown that GSA is a time-efficient and accurate approach for identifying crucial descriptors that contribute to model prediction, especially in the scenario where the dataset contains a substantial number of data entries with a limited set of descriptors. Our models as well as GSA findings can provide useful guidance on screening library design to minimize false positives.

Authors

  • Luxuan Wang
    School of Clinical Medicine, Ningxia Medical University, Yinchuan, China.
  • Beihong Ji
    Department of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh, Pittsburgh, PA 15261, USA.
  • Jingchen Zhai
    Department of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh, Pittsburgh, PA 15261, USA.
  • Junmei Wang
    Department of Pharmaceutical Sciences, Computational Chemical Genomics Screen Center, School of Pharmacy, University of Pittsburgh, 3501 Terrace St, Pittsburgh, PA, 15213, USA; Department of Pharmaceutical Sciences, School of Pharmacy, NIDA National Center of Excellence for Computational Drug Abuse Research, University of Pittsburgh, 3501 Terrace St, Pittsburgh, PA, 15213, USA. Electronic address: junmei.wang@pitt.edu.