Which approach better predicts diabetes: Traditional econometric methods or machine learning? Evidence from a cross-sectional study in South Korea.
Journal:
Computers in biology and medicine
PMID:
40121801
Abstract
To prevent chronic disease from getting worse, it is important to detect and predict it at an early stage. Therefore, the accuracy of the prediction is particularly important. To investigate the accuracy of different methods, this study compares the out-of-sample errors of machine learning algorithms and traditional econometric methods in predicting diabetes. The object of prediction in this study is fasting blood glucose, and the machine learning algorithms used are stepwise selection, bagging, random forests and support vector machine (SVM). In addition, we demonstrate the linear combination of above machine learning algorithms in this study. The findings indicate that the combined model outperforms both traditional econometric models and individual machine learning algorithms. However, the predictive performance of individual machine learning models does not consistently surpass that of traditional econometric approaches. Based on the data characteristics analyzed in this study, a possible explanation for this finding is that traditional econometric methods may exhibit superior performance in linear data prediction. Finally, the analysis of variable importance suggests that medical indicators and physical condition may play a more significant role in determining fasting blood glucose compared to hereditary factors. To further validate our results, we applied the same methodology to predict hypertension using the same dataset. The findings similarly indicated that the predictive ability of individual machine learning algorithms does not always surpass that of traditional econometric models. And a linear combination of the four machine learning algorithms enhances the predictive accuracy for hypertension.