Pediatric diabetes prediction using machine learning.

Journal: Scientific reports
Published Date:

Abstract

Diabetes is a chronic condition that affects a substantial portion of the global population and is linked to elevated mortality rates and a range of severe health complications. Despite its clinical importance, progress in diabetes research is often constrained by the limited availability of comprehensive datasets and robust predictive models. To address these challenges, researchers are increasingly turning to big data analytics and machine learning (ML) methodologies. This study presents the development of an ML-based system aimed at predicting the likelihood of diabetes and classifying its various types. A novel dataset, termed Diabetes Types Dataset, was constructed by integrating four heterogeneous dataset sources: paediatrics data from the Mansoura University Children Hospital repository, the Pima Indian Diabetes (PIMA) dataset, the Pone dataset, and a Gestational Diabetes dataset. The classification of diabetes types was approached as a multiclass problem using a suite of supervised ML algorithms, including Artificial Neural Networks (ANN), Logistic Regression, Naive Bayes, Decision Trees, Adaptive Boosting, Random Forests, Gradient Boosting, Support Vector Machines, and K-Nearest Neighbors. Model performance was evaluated using several metrics: Accuracy, Precision, Mean Squared Error, and Area Under the Receiver Operating Characteristic Curve. Among the models tested, the ANN classifier demonstrated the highest accuracy, achieving a peak performance of 99.98%. Further validation was conducted using an external dataset referred to as diabetes_prediction, which confirmed the model's robustness with consistent accuracy. Additionally, the proposed system was applied to a publicly available dataset, diabetes_Dataset, containing 34 features used to predict 12 distinct types of diabetes efficiently. The results suggest that this ML-driven approach can significantly enhance the ability of healthcare professionals to detect and classify diabetes types, thereby supporting early intervention and improved disease management.

Authors

  • Abeer El-Sayyid El-Bashbishy
    Information Systems Department, Faculty of Computer and Information Sciences, Mansoura University, Mansoura, Egypt. [email protected].
  • Hazem M El-Bakry
    Head of Information Systems Department, Faculty of Computer and Information Sciences, Mansoura University, Mansoura, Egypt.