Enhancing HCV NS3 Inhibitor Classification with Optimized Molecular Fingerprints Using Random Forest.

Journal: International journal of molecular sciences
PMID:

Abstract

The classification of Hepatitis C virus (HCV) NS3 inhibitors is essential for identifying potential antiviral agents through computational methods. This study aims to develop an optimized machine learning (ML) model using random forest (RF) and molecular fingerprints to accurately classify HCV NS3 inhibitors. A dataset of 965 molecules was retrieved from the ChEMBL database, and 290 bioactive compounds were selected for model training. Twelve molecular fingerprint descriptors were tested, and the CDK graph-only fingerprint yielded the best performance. In addition to RF, performance comparisons of other classifiers such as instance-based k-nearest neighbor (IBk), logistic regression (LR), AdaBoost, and OneR were conducted using WEKA with various molecular fingerprint descriptors. The optimized RF model achieved an accuracy of 89.6552%, a mean absolute error (MAE) of 0.2114, a root mean square error (RMSE) of 0.3304, and a Matthews correlation coefficient (MCC) of 0.7950 on the test set. These results highlight the effectiveness of optimized molecular fingerprints in enhancing virtual screening (VS) for HCV inhibitors. This approach offers a data-driven method for drug discovery.

Authors

  • Sema Atasever
    Computer Engineering Department, Nevsehir Hacı Bektas Veli University, Nevsehir, Turkey.