Ensemble-based multiclass lung cancer classification using hybrid CNN-SVD feature extraction and selection method.

Journal: PloS one
Published Date:

Abstract

Lung cancer (LC) is a leading cause of cancer-related fatalities worldwide, underscoring the urgency of early detection for improved patient outcomes. The main objective of this research is to harness the noble strategies of artificial intelligence for identifying and classifying lung cancers more precisely from CT scan images at the early stage. This study introduces a novel lung cancer detection method, which was mainly focused on Convolutional Neural Networks (CNN) and was later customized for binary and multiclass classification utilizing a publicly available dataset of chest CT scan images of lung cancer. The main contribution of this research lies in its use of a hybrid CNN-SVD (Singular Value Decomposition) method and the use of a robust voting ensemble approach, which results in superior accuracy and effectiveness for mitigating potential errors. By employing contrast-limited adaptive histogram equalization (CLAHE), contrast-enhanced images were generated with minimal noise and prominent distinctive features. Subsequently, a CNN-SVD-Ensemble model was implemented to extract important features and reduce dimensionality. The extracted features were then processed by a set of ML algorithms along with a voting ensemble approach. Additionally, Gradient-weighted Class Activation Mapping (Grad-CAM) was integrated as an explainable AI (XAI) technique for enhancing model transparency by highlighting key influencing regions in the CT scans, which improved interpretability and ensured reliable and trustworthy results for clinical applications. This research offered state-of-the-art results, which achieved remarkable performance metrics with an accuracy, AUC, precision, recall, F1 score, Cohen's Kappa and Matthews Correlation Coefficient (MCC) of 99.49%, 99.73%, 100%, 99%, 99%, 99.15% and 99.16%, respectively, addressing the prior research gaps and setting a new benchmark in the field. Furthermore, in binary class classification, all the performance indicators attained a perfect score of 100%. The robustness of the suggested approach offered more reliable and impactful insights in the medical field, thus improving existing knowledge and setting the stage for future innovations.

Authors

  • Md Sabbir Hossain
    Department of Electronics & Telecommunication Engineering, Rajshahi University of Engineering & Technology, Rajshahi, Bangladesh.
  • Niloy Basak
    Department of Electrical & Computer Engineering, Rajshahi University of Engineering & Technology, Rajshahi, Bangladesh.
  • Md Aslam Mollah
    Department of Electronics & Telecommunication Engineering, Rajshahi University of Engineering & Technology, Rajshahi, Bangladesh.
  • Md Nahiduzzaman
    Department of Electrical & Computer Engineering, Rajshahi University of Engineering & Technology, Rajshahi 6204, Bangladesh.
  • Mominul Ahsan
    Department of Computer Science, University of York, Deramore Lane, Heslington, York YO10 5GH, UK.
  • Julfikar Haider
    Department of Engineering, Manchester Metropolitan University, Chester St, Manchester M1 5GD, UK.