SGA-Driven feature selection and random forest classification for enhanced breast cancer diagnosis: A comparative study.

Journal: Scientific reports
Published Date:

Abstract

In this study, we propose a novel approach for breast cancer classification that integrates the Seagull Optimization Algorithm (SGA) for feature selection with the Random Forest (RF) classifier for effective data classification. The novelty of our approach lies in the first-time application of SGA for gene selection in breast cancer diagnosis, where SGA systematically explores the feature space to identify the most informative gene subsets, thereby improving classification accuracy and reducing computational complexity. The selected features are subsequently classified using RF, known for its robustness and high accuracy in handling complex datasets. To evaluate the effectiveness of the proposed method, we compared it with other classifiers, including Linear Regression (LR), Support Vector Machine (SVM), and K-Nearest Neighbors (KNN). The proposed SGA-RF combination achieved a best mean accuracy of 99.01% with 22 genes, outperforming other methods and demonstrating consistent performance across varying feature subsets. The mean accuracies ranged from 85.35 to 94.33%, highlighting a balance between feature reduction and classification accuracy. Future work will explore the integration of other nature-inspired algorithms and deep learning models to further enhance performance and clinical applicability.

Authors

  • Abrar Yaqoob
    VIT Bhopal University's School of Advanced Science and Language, Located at Kothrikalan, Sehore, Bhopal, 466114, India.
  • Navneet Kumar Verma
    VIT Bhopal University's School of Advanced Science and Language, Located at Kothrikalan, Sehore, Bhopal, 466114, India.
  • Mushtaq Ahmad Mir
    Department of Clinical Laboratory Sciences, College of Applied Medical Science, King Khalid University, KSA, Saudi Arabia.
  • Ghanshyam G Tejani
    Department of Industrial Engineering and Management, Yuan Ze University, Taoyuan, 320315, Taiwan.
  • Nashwa Hassan Babiker Eisa
    Department of Clinical Laboratory Sciences, College of Applied Medical Sciences, King Khalid University, Abha, 61421, Saudi Arabia.
  • Hind Mamoun Hussien Osman
    Department of Clinical Laboratory Sciences, College of Applied Medical Sciences, King Khalid University, Abha, 61421, Saudi Arabia.
  • Mohd Asif Shah
    Bakhtar University, Kabul, Afghanistan.