Efficient diagnosis of diabetes mellitus using an improved ensemble method.

Journal: Scientific reports
PMID:

Abstract

Diabetes is a growing health concern in developing countries, causing considerable mortality rates. While machine learning (ML) approaches have been widely used to improve early detection and treatment, several studies have shown low classification accuracies due to overfitting, underfitting, and data noise. This research employs parallel and sequential ensemble ML approaches paired with feature selection techniques to boost classification accuracy. The Pima India Diabetes Data from the UCI ML Repository served as the dataset. Data preprocessing included cleaning the dataset by replacing missing values with column means and selecting highly correlated features using forward and backward selection methods. The dataset was split into two parts: training (70%), and testing (30%). Python was used for classification in Jupyter Notebook, and there were two design phases. The first phase utilized J48, Classification and Regression Tree (CART), and Decision Stump (DS) to create a random forest model. The second phase employed the same algorithms alongside sequential ensemble methods-XG Boost, AdaBoostM1, and Gradient Boosting-using an average voting algorithm for binary classification. Evaluation revealed that XG Boost, AdaBoostM1, and Gradient Boosting achieved classification accuracies of 100%, with performance metrics including F1 score, MCC, Precision, Recall, AUC-ROC, and AUC-PR all equal to 1.00, indicating reliable predictions of diabetes presence. Researchers and practitioners can leverage the predictive model developed in this work to make quick predictions of diabetes mellitus, which could save many lives.

Authors

  • Blessing Oluwatobi Olorunfemi
    Department of Computer Science, Faculty of Natural Sciences, Redeemer's University, Ede, Osun state, Nigeria.
  • Adewale Opeoluwa Ogunde
    Department of Computer Science, Faculty of Natural Sciences, Redeemer's University, Ede, Osun state, Nigeria.
  • Ahmad Almogren
    Chia of Pervasive and Mobile Computing, College of Computer and Information Sciences, King Saud University, Riyadh, 11543, Saudi Arabia.
  • Abidemi Emmanuel Adeniyi
    Department of Computer Science, Bowen University, Iwo, Nigeria.
  • Sunday Adeola Ajagbe
    Department of Computer Science, University of Zululand, Kwadlangezwa, 3886, South Africa.
  • Salil Bharany
    Department of Computer Engineering & Technology, Guru Nanak Dev University, Amritsar 143005, India.
  • Ayman Altameem
    Department of Natural and Engineering Sciences, College of Applied Studies and Community Services, King Saud University, 11543, Riyadh, Saudi Arabia.
  • Ateeq Ur Rehman
    Department of Computer Science, Abdul Wali Khan University Mardan, Pakistan.
  • Asif Mehmood
    Department of Biomedical Engineering, College of IT Convergence, Gachon University, 1342 Seongnamdaero, Sujeong-gu, Seongnam-si 13120, Republic of Korea.
  • Habib Hamam
    School of Electrical Engineering, Department of Electrical and Electronic Engineering Science, University of Johannesburg, Johannesburg, South Africa.