Biomarker discovery for early breast cancer diagnosis using machine learning on transcriptomic data for biosensor development.

Journal: Computers in biology and medicine
Published Date:

Abstract

Breast cancer is the second leading cause of female mortality globally. Effective diagnostic tools, such as biosensors that utilize reliable biomarkers, are essential for early detection, particularly in low-income countries. This study introduces a novel bioinformatics pipeline that uses machine learning algorithms (MLAs) to identify genetic biomarkers for classifying breast cancer into non-malignant, non-triple-negative, and triple-negative categories. Five Gene Selection Approaches (GSAs) were employed: LASSO (Least Absolute Shrinkage and Selection Operator), Membrane LASSO, Surfaceome LASSO, Network Analysis, and Feature Importance Score (FIS). We implemented three factorial designs to assess the impact of MLAs and GSAs on classification performance (F1 Macro and Accuracy) in both cell lines and patient samples. Using Recursive Feature Elimination (RFE) and Genetic Algorithms (GAs) in the first four GSAs, we reduced the gene count to eight per GSA while maintaining an F1 Macro ≥80 %. Consequently, 95.5 % of our treatments with these gene sets achieved an F1 Macro or Accuracy ranging from 70.3 % to 97.2 %. We analyzed 37 genes for their predictive power in terms of five-year survival and relapse-free survival and compared them with genes from four commercial panels. Notably, thirteen genes (MFSD2A, TMEM74, SFRP1, UBXN10, CACNA1H, ERBB2, SIDT1, TMEM129, MME, FLRT2, CA12, ESR1, and TBC1D9) showed significant predictive capabilities for up to five years of survival. TBC1D9, UBXN10, SFRP1, and MME were significant for relapse-free survival after five years. The FOXC1, MLPH, FOXA1, ESR1, ERBB2, and SFRP1 genes also matched those described in commercial panels. The influence of MLA on F1 Macro and Accuracy was not statistically significant. Altogether, the genetic biomarkers identified in this study hold potential for use in biosensors aimed at breast cancer diagnosis and treatment.

Authors

  • Kalaumari Mayoral-Peña
    School of Engineering and Sciences, Campus Queretaro, Tecnologico de Monterrey, Queretaro, 76130, Mexico; Department of Medicine, Division of Engineering in Medicine, Brigham and Women's, Hospital Harvard Medical School, Boston, MA, 02115, USA.
  • Omar Israel González Peña
    Evidence-Based Medicine Research Unit, Children's Hospital of Mexico Federico Gómez, National Institute of Health, Mexico City, 06720, Mexico; Vice-Rectorate of Health Sciences, Universidad de Monterrey, San Pedro Garza García, Nuevo León, 66238, Mexico; School of Engineering and Sciences, Campus Monterrey, Tecnologico de Monterrey, Monterrey, 64849, Mexico; Education and Research Department, Hospital Clínica Nova de Monterrey, San Nicolas de los Garza, 66450, Nuevo Leon, Mexico; International University of La Rioja, Avda. de la Paz, 137, Logrono La Rioja, Spain. Electronic address: ogonzalez.pena@gmail.com.
  • Natalie Artzi
    Department of Medicine, Division of Engineering in Medicine, Brigham and Women's, Hospital Harvard Medical School, Boston, MA, 02115, USA; Institute for Medical Engineering and Science, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA; Wyss Institute for Biologically Inspired Engineering Harvard University Boston, MA, 02115, USA.
  • Marcos de Donato
    School of Engineering and Sciences, Campus Queretaro, Tecnologico de Monterrey, Queretaro, 76130, Mexico; The Center for Aquaculture Technologies, San Diego, CA, 92121, USA. Electronic address: mdedonate@tec.mx.