Explainable machine learning framework for the molecular classification of triple negative breast cancer.

Journal: Computer methods and programs in biomedicine
Published Date:

Abstract

BACKGROUND AND OBJECTIVE: The difference in molecular characteristics of Triple negative breast cancer (TNBC) aids in distinguishing between its four prominent subtypes- basal-like 1, basal-like 2, mesenchymal, and luminal androgen receptor. This study presents the first integrative framework that combines explainable AI with machine learning approaches to classify TNBC subtypes. Unlike conventional models, our approach offers interpretability while enabling biomarker prioritization by identifying key hub genes that drive subtype-specific predictions. METHODS: In the experiment 783 cases (BL1 (160), BL2 (75), M (151), LAR (106), non-TNBC (291) reported in Gene Expression Omnibus (GEO) and Genomic Data Commons (GDC) data portal were used for the analysis. The proposed framework comprises modules for the identification of gene signatures for the four-subtype followed by the classification model based on eight different machine learning algorithms. Random Forest classifier was found to be best model with 96 % testing accuracy, which was elected for Explainable framework using Shapley Additive Explanations. RESULTS: Explainable biomarker module could provide a set of 47 biomarkers which is relevant in distinguishing the four types on triple negative breast cancer. The biomarkers could have the potential to be considered for TNBC prognosis in clinical setting. CONCLUSION: Key findings highlight the hub genes CDC20, CDCA2, PIMREG, KIF2C, and CENPW, implicating pathways such as ubiquitin-proteasome signaling and microtubule dynamics. These insights pave the way for biomarker-driven therapies and precision medicine in triple negative breast cancer.

Authors

Keywords

No keywords available for this article.