Machine learning approach for differentiating iron deficiency anemia and thalassemia using random forest and gradient boosting algorithms.

Journal: Scientific reports

PMID: 40374805

Abstract

Formulas based on red blood cell indices have been used to differentiate between iron deficiency anemia (IDA) and thalassemia (Thal). However, they exhibit varying efficiencies. In this study, we aimed to develop a tool for discriminating between IDA and Thal by using the random forest (RF) and gradient boosting (GB) algorithms. Complete blood count data from 1143 patients with anemia and low mean corpuscular volume were collected (382 patients with IDA, 635 with Thal, and 126 with IDA and Thal). The data were randomly divided into the training and testing datasets in a ratio of 80:20. The RF and GB models had good diagnostic performances for predicting IDA and Thal in the training and testing datasets. In the testing dataset for predicting binary outcomes, GB and RF both had an accuracy of 90.7%, and an area under the receiver operating characteristic curve (AUC-ROC) of 0.953. A lower diagnostic performance was observed when patients with IDA and Thal were included. GB and RF showed accuracies of 80.4% and 82.2%, respectively, and AUC-ROC values of 0.910 and 0.899, respectively. In conclusion, we developed a machine learning approach using GB algorithm. This tool is potentially useful in Thal- and IDA-endemic regions.

Authors

Wanicha Tepakhan

Department of Pathology, Faculty of Medicine, Prince of Songkla University, Hat Yai, Songkhla, Thailand.
Wisarut Srisintorn

Department of Family Medicine and Preventive Medicine, Faculty of Medicine, Prince of Songkla University, Hat Yai, Songkhla, Thailand.
Tipparat Penglong

Department of Pathology, Faculty of Medicine, Prince of Songkla University, Hat Yai, Songkhla, Thailand.
Pirun Saelue

Faculty of Medicine, Prince of of Songkla University, Songkhla, Thailand.

Keywords

Adolescent Adult Algorithms Anemia, Iron-Deficiency Boosting Machine Learning Algorithms Diagnosis, Differential Erythrocyte Indices Female Humans Machine Learning Male Middle Aged Random Forest ROC Curve Thalassemia Young Adult

External Resources

View on PubMed Access via DOI PubMed (40374805)

Machine learning approach for differentiating iron deficiency anemia and thalassemia using random forest and gradient boosting algorithms.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals