Improving prediction of blood cancer using leukemia microarray gene data and Chi2 features with weighted convolutional neural network.

Journal: Scientific reports
PMID:

Abstract

Blood cancer has emerged as a growing concern over the past decade, necessitating early diagnosis for timely and effective treatment. The present diagnostic method, which involves a battery of tests and medical experts, is costly and time-consuming. For this reason, it is crucial to establish an automated diagnostic system for accurate predictions. A particular field of focus in medical research is the use of machine learning and leukemia microarray gene data for blood cancer diagnosis. Even with a great deal of research, more improvements are needed to reach the appropriate levels of accuracy and efficacy. This work presents a supervised machine-learning algorithm for blood cancer prediction. This work makes use of the 22,283-gene leukemia microarray gene data. Chi-squared (Chi2) feature selection methods and the synthetic minority oversampling technique (SMOTE)-Tomek resampling is used to overcome issues with imbalanced and high-dimensional datasets. To balance the dataset for each target class, SMOTE-Tomek creates synthetic data, and Chi2 chooses the most important features to train the learning models from 22,283 genes. A novel weighted convolutional neural network (CNN) model is proposed for classification, utilizing the support of three separate CNN models. To determine the importance of the proposed approach, extensive experiments are carried out on the datasets, including a performance comparison with the most advanced techniques. Weighted CNN demonstrates superior performance over other models when coupled with SMOTE-Tomek and Chi2 techniques, achieving a remarkable 99.9% accuracy. Results from k-fold cross-validation further affirm the supremacy of the proposed model.

Authors

  • Ebtisam Abdullah Alabdulqader
    Department of Information Technology, College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia.
  • Aisha Ahmed Alarfaj
    Department of Information Systems, College of Computer and Information Sciences, Princess Nourah Bint Abdulrahman University, Riyadh, Saudi Arabia.
  • Muhammad Umer
    Department of Computer Science & Information Technology, The Islamia University of Bahawalpur, Bahawalpur, Pakistan.
  • Ala' Abdulmajid Eshmawi
    Department of Cybersecurity, College of Computer Science and Engineering, University of Jeddah, Saudi Arabia.
  • Shtwai Alsubai
    Department of Computer Science, College of Computer Engineering and Sciences, Prince Sattam Bin Abdulaziz University, Al-Kharj, Saudi Arabia.
  • Tai-Hoon Kim
    School of Electrical and Computer Engineering, Yeosu Campus, Chonnam National University, 50, Daehak-ro, Yeosu-si, 59626, Jeollanam-do, Republic of Korea. taihoonn@chonnam.ac.kr.
  • Imran Ashraf
    Information and Communication Engineering, Yeungnam University, Gyeongsan si, Daegu, South Korea.