Cohesive data analysis for the identification of prognostic hub genes and significant pathways associated with HER2 + and TN breast cancer types.

Journal: Scientific reports
Published Date:

Abstract

Breast cancer is the most prevalent and lethal form of cancer being the utmost common medical concern of women. Breast cancer etiology implicates numerous cellular protein receptors such as estrogen receptors (ER), progesterone receptors (PR), and human epidermal growth factor/receptor 2 (HER2) which turn on oncogenic cascade often attributed to certain genetic variations. Breast Cancer is thus classified into ER + /-, PR + /-, HER2 ± and Triple Negative types. This study seeks to build upon our current knowledge of HER2 + and TNBC BC types to discover novel patterns for diagnosis and prognosis. The study exploits wealth of HER2 + and TNBC transcriptome (RNA Seq) data to elucidate the key hub genes, their associated networks, pathways, stage-wise expression profile, role in prognosis and survival expectancy, and regulatory transcription factors. The study also employs machine learning models including support vector machine (SVM), XGBoost, Random Forest, k nearest neighbor (kNN), Naïve Bayes and Voting Classifier to distinguish between HER2 + and TNBC transcriptomes which is a key variable for early detection and choice of therapeutic alternatives. RNA Seq datasets consisting of 49 HER2 + and 44 TNBC breast tumor samples were retrieved and pre-processed. Differentially Expressed Genes (DEGs) along with their logFC and p-values were fetched. The KEGG (Kyoto Encyclopedia of Genes and Genomes) and GO (Gene Ontology) analyses of DEGs were conducted on DAVID (the Database for Annotation, Visualization and Integrated Discovery) and interaction network was constructed through Cytoscape. Ten hub genes were obtained based on maximum clique centrality (MCC), maximum neighborhood component (MNC), degree, closeness and betweenness using cytoHubba which included ACTB, ATM, ESR1, GAPDH, HNRNPK, KRAS, MDM2, SIRT1, TP53, and H3F3C (H3-5). These hub genes were found to be associated with cell proliferation, invasion and migration. Transcription factors and association of the expression profile of these hub genes with survival expectancy was also determined. Among the ML models, SVM stood out, exhibiting classification success between HER2 + and TNBC transcriptomes with an accuracy of 90%. The findings of this study can therefore effectively aid in tracing the initial prognosis of BC and identify biomarkers for the personalized prevention, prediction, diagnosis, and treatment of BC.

Authors

  • Mahrukh Zakir
    Department of Biosciences, COMSATS University, Park Road Islamabad, Islamabad, Pakistan.
  • Alishbah Saddiqa
    Department of Biosciences, COMSATS University, Park Road Islamabad, Islamabad, Pakistan.
  • Mawara Sheikh
    Pakistan Agriculture Research Council Islamabad, Islamabad, Pakistan.
  • Lalarukh Zakir
    Azad Jammu and Kashmir Medical College, Muzaffrabad, Pakistan.
  • Fatima Sami
    Department of Biosciences, COMSATS University, Park Road Islamabad, Islamabad, Pakistan.
  • Faisal Sardar Ahmad
    Department of Biosciences, COMSATS University, Park Road Islamabad, Islamabad, Pakistan.
  • Sadaf Abdul Rauf
    Fatima Jinnah Women University, Rawalpindi, Pakistan.
  • Iqra Ali
    Department of Biosciences, COMSATS University, Park Road Islamabad, Islamabad, Pakistan.
  • Zahid Muneer
    Department of Biosciences, COMSATS University, Park Road Islamabad, Islamabad, Pakistan.
  • Wadi B Alonazi
    Health Administration Department, College of Business Administration, King Saud University, P. O Box: 71115, Riyadh 11587, Saudi Arabia.
  • Abdul Rauf Siddiqi
    Department of Biosciences, COMSATS University, Park Road Islamabad, Islamabad, Pakistan. araufsiddiqi@comsats.edu.pk.