Data-driven diabetes mellitus prediction and management: a comparative evaluation of decision tree classifier and artificial neural network models along with statistical analysis.
Journal:
Scientific reports
Published Date:
Jun 2, 2025
Abstract
Diabetes Mellitus is a chronic metabolic disorder affecting a substantial global population leading to complications such as retinopathy, nephropathy, neuropathy, foot problems, heart attacks, and strokes if left unchecked. Prompt detection and diagnosis are crucial in managing and averting these complications. This study compares the effectiveness of a Decision Tree Classifier and an Artificial Neural Network (ANN) in predicting Diabetes Mellitus. The Decision Tree Classifier demonstrated superior performance, achieving a 97.7% accuracy rate compared to the ANN's 94.7%. The Decision Tree Classifier also achieved higher precision (96.9% vs. 88.8%) and recall (96.5% vs. 90.2%) than the ANN, along with a balanced F1 score of 96.5% versus 90.2%. The Matthews Correlation Coefficient (MCC) confirmed a stronger correlation between predictions and actual labels for the Decision Tree Classifier (87.4%) compared to the ANN (78%). Furthermore, the Area Under Curve (AUC) score of 96% for the Decision Tree Classifier was higher than that of ANN (78%). The relative importance feature analysis clearly established glycated hemoglobin (HbA1c) as the paramount factor in predicting diabetes mellitus. Diabetic patients showed markedly higher cholesterol and triglycerides, increasing cardiovascular risk, while High Density Lipoprotein (HDL) and Low-Density Lipoprotein (LDL) levels showed no significant difference between diabetics and non-diabetics. However, Very Low-Density Lipoprotein (VLDL) was significantly elevated, suggesting altered lipid transport in diabetes. Body Mass Index (BMI) was also notably higher in diabetics, reinforcing the link between obesity and diabetes risk. Principal Component analysis further highlighted five clusters of health-related variables, identifying age-related metabolic indicators (AGE, HbA1c, BMI), kidney function markers (creatinine (Cr), Urea), cardiovascular lipid profiles (Cholesterol, LDL), lipid transport (VLDL), and protective cardiovascular indicator (HDL). The study highlights the superiority of decision tree classifier in predicting Diabetes Mellitus, suggesting its potential for significant clinical applications in diagnosis and management.