Proposed Comprehensive Methodology Integrated with Explainable Artificial Intelligence for Prediction of Possible Biomarkers in Metabolomics Panel of Plasma Samples for Breast Cancer Detection.

Journal: Medicina (Kaunas, Lithuania)
PMID:

Abstract

: Breast cancer (BC) is the most common type of cancer in women, accounting for more than 30% of new female cancers each year. Although various treatments are available for BC, most cancer-related deaths are due to incurable metastases. Therefore, the early diagnosis and treatment of BC are crucial before metastasis. Mammography and ultrasonography are primarily used in the clinic for the initial identification and staging of BC; these methods are useful for general screening but have limitations in terms of sensitivity and specificity. Omics-based biomarkers, like metabolomics, can make early diagnosis much more accurate, make tracking the disease's progression more accurate, and help make personalized treatment plans that are tailored to each tumor's specific molecular profile. Metabolomics technology is a feasible and comprehensive method for early disease detection and biomarker identification at the molecular level. This research aimed to establish an interpretable predictive artificial intelligence (AI) model using plasma-based metabolomics panel data to identify potential biomarkers that distinguish BC individuals from healthy controls. : A cohort of 138 BC patients and 76 healthy controls were studied. Plasma metabolites were examined using LC-TOFMS and GC-TOFMS techniques. Extreme Gradient Boosting (XGBoost), Light Gradient Boosting Machine (LightGBM), Adaptive Boosting (AdaBoost), and Random Forest (RF) were evaluated using performance metrics such as Receiver Operating Characteristic-Area Under the Curve (ROC AUC), accuracy, sensitivity, specificity, and F1 score. ROC and Precision-Recall (PR) curves were generated for comparative analysis. The SHapley Additive Descriptions (SHAP) analysis evaluated the optimal prediction model for interpretability. : The RF algorithm showed improved accuracy (0.963 ± 0.043) and sensitivity (0.977 ± 0.051); however, LightGBM achieved the highest ROC AUC (0.983 ± 0.028). RF also achieved the best Precision-Recall Area under the Curve (PR AUC) at 0.989. SHAP search found glycerophosphocholine and pentosidine as the most significant discriminatory metabolites. Uracil, glutamine, and butyrylcarnitine were also among the significant metabolites. : Metabolomics biomarkers and an explainable AI (XAI)-based prediction model showed significant diagnostic accuracy and sensitivity in the detection of BC. The proposed XAI system using interpretable metabolite data can serve as a clinical decision support tool to improve early diagnosis processes.

Authors

  • Cemil Colak
    Inonu University, Faculty of Medicine, Department of Biostatistics and Medical Informatics, Malatya, Turkey. Electronic address: cemilcolak@yahoo.com.
  • Fatma Hilal Yagin
    Department of Biostatistics and Medical Informatics, Faculty of Medicine, Inonu University, Malatya, Türkiye.
  • Abdulmohsen Algarni
    Computer Science, King Khalid University, Abha, Saudi Arabia.
  • Ali Algarni
    Department of Statistics, Faculty of Science, King Abdulaziz University, Jeddah, Saudi Arabia.
  • Fahaid Al-Hashem
    Department of Physiology, College of Medicine, King Khalid University, Abha, Saudi Arabia.
  • Luca Paolo Ardigò
    Department of Teacher Education, NLA University College, Oslo, Norway.