Biomarker discovery for early breast cancer diagnosis using machine learning on transcriptomic data for biosensor development.
Journal:
Computers in biology and medicine
Published Date:
Jul 11, 2025
Abstract
Breast cancer is the second leading cause of female mortality globally. Effective diagnostic tools, such as biosensors that utilize reliable biomarkers, are essential for early detection, particularly in low-income countries. This study introduces a novel bioinformatics pipeline that uses machine learning algorithms (MLAs) to identify genetic biomarkers for classifying breast cancer into non-malignant, non-triple-negative, and triple-negative categories. Five Gene Selection Approaches (GSAs) were employed: LASSO (Least Absolute Shrinkage and Selection Operator), Membrane LASSO, Surfaceome LASSO, Network Analysis, and Feature Importance Score (FIS). We implemented three factorial designs to assess the impact of MLAs and GSAs on classification performance (F1 Macro and Accuracy) in both cell lines and patient samples. Using Recursive Feature Elimination (RFE) and Genetic Algorithms (GAs) in the first four GSAs, we reduced the gene count to eight per GSA while maintaining an F1 Macro ≥80 %. Consequently, 95.5 % of our treatments with these gene sets achieved an F1 Macro or Accuracy ranging from 70.3 % to 97.2 %. We analyzed 37 genes for their predictive power in terms of five-year survival and relapse-free survival and compared them with genes from four commercial panels. Notably, thirteen genes (MFSD2A, TMEM74, SFRP1, UBXN10, CACNA1H, ERBB2, SIDT1, TMEM129, MME, FLRT2, CA12, ESR1, and TBC1D9) showed significant predictive capabilities for up to five years of survival. TBC1D9, UBXN10, SFRP1, and MME were significant for relapse-free survival after five years. The FOXC1, MLPH, FOXA1, ESR1, ERBB2, and SFRP1 genes also matched those described in commercial panels. The influence of MLA on F1 Macro and Accuracy was not statistically significant. Altogether, the genetic biomarkers identified in this study hold potential for use in biosensors aimed at breast cancer diagnosis and treatment.