Comprehensive bioinformatics and machine learning analyses for breast cancer staging using TCGA dataset.

Journal: Briefings in bioinformatics
Published Date:

Abstract

Breast cancer is an alarming global health concern, including a vast and varied set of illnesses with different molecular characteristics. The fusion of sophisticated computational methodologies with extensive biological datasets has emerged as an effective strategy for unravelling complex patterns in cancer oncology. This research delves into breast cancer staging, classification, and diagnosis by leveraging the comprehensive dataset provided by the The Cancer Genome Atlas (TCGA). By integrating advanced machine learning algorithms with bioinformatics analysis, it introduces a cutting-edge methodology for identifying complex molecular signatures associated with different subtypes and stages of breast cancer. This study utilizes TCGA gene expression data to detect and categorize breast cancer through the application of machine learning and systems biology techniques. Researchers identified differentially expressed genes in breast cancer and analyzed them using signaling pathways, protein-protein interactions, and regulatory networks to uncover potential therapeutic targets. The study also highlights the roles of specific proteins (MYH2, MYL1, MYL2, MYH7) and microRNAs (such as hsa-let-7d-5p) that are the potential biomarkers in cancer progression founded on several analyses. In terms of diagnostic accuracy for cancer staging, the random forest method achieved 97.19%, while the XGBoost algorithm attained 95.23%. Bioinformatics and machine learning meet in this study to find potential biomarkers that influence the progression of breast cancer. The combination of sophisticated analytical methods and extensive genomic datasets presents a promising path for expanding our understanding and enhancing clinical outcomes in identifying and categorizing this intricate illness.

Authors

  • Saurav Chandra Das
    Department of Computer Science and Engineering, Jagannath University, Dhaka-1100, Bangladesh.
  • Wahia Tasnim
    Department of Computer Science and Engineering, Green University of Bangladesh, Narayanganj-1461, Dhaka, Bangladesh.
  • Humayan Kabir Rana
    Department of Computer Science and Engineering, Green University of Bangladesh, Dhaka, Bangladesh.
  • Uzzal Kumar Acharjee
    Department of Computer Science and Engineering, Jagannath University, Dhaka-1100, Bangladesh.
  • Md Manowarul Islam
    Department of Computer Science and Engineering, Jagannath University, Dhaka 1100, Bangladesh.
  • Rabea Khatun
    Department of Computer Science and Engineering, Green University of Bangladesh, Narayanganj-1461, Dhaka, Bangladesh.