A comprehensive case study of deep learning on the detection of alpha thalassemia and beta thalassemia using public and private datasets.

Journal: Scientific reports
PMID:

Abstract

This study explores the performance of deep learning models, specifically Convolutional Neural Networks (CNN) and XGBoost, in predicting alpha and beta thalassemia using both public and private datasets. Thalassemia is a genetic disorder that impairs hemoglobin production, leading to anemia and other health complications. Early diagnosis is essential for effective management and prevention of severe health issues. The study applied CNN and XGBoost to two case studies: one for alpha-thalassemia and the other for beta-thalassemia. Public datasets were sourced from medical databases, while private datasets were collected from clinical records, offering a more comprehensive feature set and larger sample sizes. After data preprocessing and splitting, model performance was evaluated. XGBoost achieved 99.34% accuracy on the private dataset for alpha thalassemia, while CNN reached 98.10% accuracy on the private dataset for beta-thalassemia. The superior performance on private datasets was attributed to better data quality and volume. This study highlights the effectiveness of deep learning in medical diagnostics, demonstrating that high-quality data can significantly enhance the predictive capabilities of AI models. By integrating CNN and XGBoost, this approach offers a robust method for detecting thalassemia, potentially improving early diagnosis and reducing disease-related mortality.

Authors

  • Muhammad Umar Nasir
    Riphah School of Computing and Innovation, Faculty of Computing, Riphah International University, Lahore Campus, Lahore 54000, Pakistan.
  • Muhammad Tahir Naseem
    Research Institute of Human Ecology, Yeungnam University, Gyeongsan 38541, Korea.
  • Taher M Ghazal
    Center for Cyber Security, Faculty of Information Science and Technology, University Kebangsaan Malaysia (UKM), 43600 Bangi, Selangor, Malaysia.
  • Muhammad Zubair
    Swedish University of Agricultural Sciences, Department of Plant Breeding and Biotechnology Balsgård, Fjälkestadsvägen 459, SE-291 94 Kristianstad, Sweden.
  • Oualid Ali
    Computer Sciences Department, College of Arts & Science, Applied Science University, P.O.Box 5055, Manama, Kingdom of Bahrain.
  • Sagheer Abbas
    Department of Computer Science, National College of Business Administration and Economics, Lahore, Pakistan.
  • Munir Ahmad
    School of Computer Science, National College of Business Administration & Economics, Lahore 54000, Pakistan.
  • Khan Muhammad Adnan
    Department of Software, Faculty of Artificial Intelligence and Software, Gachon University, Seongnam-si, 13557, Republic of Korea. adnan@gachon.ac.kr.