Predicting Diabetes Using Machine Learning: A Comparative Study of Classifiers
Journal:
arXiv
Published Date:
May 11, 2025
Abstract
Diabetes remains a significant health challenge globally, contributing to
severe complications like kidney disease, vision loss, and heart issues. The
application of machine learning (ML) in healthcare enables efficient and
accurate disease prediction, offering avenues for early intervention and
patient support. Our study introduces an innovative diabetes prediction
framework, leveraging both traditional ML techniques such as Logistic
Regression, SVM, Na\"ive Bayes, and Random Forest and advanced ensemble methods
like AdaBoost, Gradient Boosting, Extra Trees, and XGBoost. Central to our
approach is the development of a novel model, DNet, a hybrid architecture
combining Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM)
layers for effective feature extraction and sequential learning. The DNet model
comprises an initial convolutional block for capturing essential features,
followed by a residual block with skip connections to facilitate efficient
information flow. Batch Normalization and Dropout are employed for robust
regularization, and an LSTM layer captures temporal dependencies within the
data. Using a Kaggle-sourced real-world diabetes dataset, our model evaluation
spans cross-validation accuracy, precision, recall, F1 score, and ROC-AUC.
Among the models, DNet demonstrates the highest efficacy with an accuracy of
99.79% and an AUC-ROC of 99.98%, establishing its potential for superior
diabetes prediction. This robust hybrid architecture showcases the value of
combining CNN and LSTM layers, emphasizing its applicability in medical
diagnostics and disease prediction tasks.