Comparative investigation of bagging enhanced machine learning for early detection of HCV infections using class imbalance technique with feature selection.

Journal: PloS one
Published Date:

Abstract

Around 1.5 million new cases of Hepatitis C Virus (HCV) are diagnosed globally each year (World Health Organization, 2023). Consequently, there is a pressing need for early diagnostic methods for HCV. This study investigates the prognostic accuracy of several ensemble machine learning (ML) models for diagnosing HCV infection. The study utilizes a dataset comprising demographic information of 615 individuals suspected of having HCV infection. Additionally, the research employs oversampling and undersampling techniques to address class imbalances in the dataset and conducts feature reduction using the F-test in one-way analysis of variance. Ensemble ML methods, including Support Vector Machine (SVM), k-Nearest Neighbors (k-NN), Logistic Regression (LR), Random Forest (RF), Naïve Bayes (NB), and Decision Tree (DT), are used to predict HCV infection. The performance of these ensemble methods is evaluated using metrics such as accuracy, recall, precision, F1 score, G-mean, balanced accuracy, cross-validation (CV), area under the curve (AUC), standard deviation, and error rate. Compared with previous studies, the Bagging k-NN model demonstrated superior performance under oversampling conditions, achieving 98.37% accuracy, 98.23% CV score, 97.67% precision, 97.93% recall, 98.18% selectivity, 97.79% F1 score, 98.06% balanced accuracy, 98.05% G-mean, a 1.63% error rate, 0.98 AUC, and a standard deviation of 0.192. This study highlights the potential of ensemble ML approaches in improving the diagnosis of HCV. The findings provide a foundation for developing accurate predictive methods for HCV diagnosis.

Authors

  • Ekramul Haque Tusher
    Faculty of Computing, Universiti Malaysia Pahang Al-Sultan Abdullah, Pahang, Malaysia.
  • Mohd Arfian Ismail
    Faculty of Computing, Universiti Malaysia Pahang Al-Sultan Abdullah, Pekan, Pahang, Malaysia.
  • Abdullah Akib
    Industrial Engineering, Lamar University, Beaumont, Texas, United States of America.
  • Lubna A Gabralla
    Department of Computer Science, Applied College, Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia.
  • Ashraf Osman Ibrahim
    Faculty of Computing & Informatics, Universiti Malaysia Sabah, Jalan UMS, Kota Kinabalu 88400, Sabah, Malaysia.
  • Hafizan Mat Som
    Computer and Information Sciences Department, Faculty of Science and Information Technology, Universiti Teknologi Petronas, Perak, Malaysia.
  • Muhammad Akmal Remli
    Institute for Artificial Intelligence and Big Data, Universiti Malaysia Kelantan, City Campus, Kota Bharu 16100, Kelantan, Malaysia.