Multiclass leukemia cell classification using hybrid deep learning and machine learning with CNN-based feature extraction.

Journal: Scientific reports
Published Date:

Abstract

Leukemia is the most prevalent form of blood cancer, affecting individuals across all age groups. Early and accurate diagnosis is crucial for effective treatment and improved clinical outcomes. Peripheral blood smear analysis, a key non-invasive diagnostic tool, often suffers from subjective interpretation, inter-observer variability, and a lack of readily available expertise. Although deep learning approaches, particularly Convolutional Neural Networks (CNNs), have demonstrated exceptional performance in binary classification tasks, multiclass classification of leukemia subtypes remains challenging due to limited data availability and morphological similarities between subtypes. This study presents a novel hybrid methodology that combines pre-trained CNN architectures, including VGG16, InceptionV3, and ResNet50, with advanced classification models such as Random Forest (RF), Support Vector Machine (SVM), Extreme Gradient Boosting (XGBoost), and the deep learning-based Multi-Layer Perceptron (MLP). The method leverages publicly available datasets, the Acute Lymphoblastic Leukemia Image Database (ALL-IDB) and the Munich AML Morphology Dataset, to classify healthy cells, lymphoblasts, and myeloblasts. Pre-trained CNNs are employed for feature extraction, while the classifiers refine the predictions for improved accuracy. The proposed approach demonstrated exceptional performance, with the InceptionV3 + SVM combination achieving the highest accuracy of 88%, followed closely by VGG16 + XGBoost at 87%. MLP-based models also achieved strong results, effectively capturing non-linear patterns in the data. In contrast, ResNet50 exhibited limitations, likely due to overfitting caused by the small dataset. The novelty of this work lies in the integration of pre-trained deep learning architectures with hybrid classification techniques, enabling robust multiclass classification in data-constrained scenarios. This innovative approach offers a scalable and precise diagnostic tool, improving the speed and reliability of leukemia subtype identification and providing significant potential to enhance clinical decision-making and patient care.

Authors

  • Sazzli Kasim
    Cardiology Department, Faculty of Medicine, Universiti Teknologi MARA (UiTM), Shah Alam, Malaysia.
  • Sorayya Malek
    Bioinformatics Division, Institute of Biological Sciences, Faculty of Science, University of Malaya, Kuala Lumpur, Malaysia.
  • Junjie Tang
    Faculty of Science, Bioinformatics Division, Institute of Biological Sciences, University of Malaya, Kuala Lumpur, Malaysia.
  • Xue Ning Kiew
    Institute of Biological Sciences, Faculty of Science, University Malaya, Kuala Lumpur, Malaysia.
  • Song Cheen
    Microbiome Research Centre, Monash University Malaysia, Subang Jaya, Malaysia.
  • Bryan Liew
    Institute of Biological Sciences, Faculty of Science, University Malaya, Kuala Lumpur, Malaysia.
  • Norashikin Saidon
    Faculty of Medicine, Universiti Teknologi MARA (UiTM), Sungai Buloh Campus, Sungai Buloh, Malaysia.
  • Raja Ezman
    Cardiology Department, Faculty of Medicine, Universiti Teknologi MARA (UiTM), Shah Alam, Malaysia.
  • Raja Shariff
    Cardiology Department, Faculty of Medicine, Universiti Teknologi MARA (UiTM), Shah Alam, Malaysia.