[Identification of Protein-Coding Gene Markers in Breast Invasive Carcinoma Based on Machine Learning].

Journal: Zhongguo yi xue ke xue yuan xue bao. Acta Academiae Medicinae Sinicae
Published Date:

Abstract

Objective To screen out the biomarkers linked to prognosis of breast invasive carcinoma based on the analysis of transcriptome data by random forest (RF),extreme gradient boosting (XGBoost),light gradient boosting machine (LightGBM),and categorical boosting (CatBoost). Methods We obtained the expression data of breast invasive carcinoma from The Cancer Genome Atlas and employed DESeq2,-test,and Cox univariate analysis to identify the differentially expressed protein-coding genes associated with survival prognosis in human breast invasive carcinoma samples.Furthermore,RF,XGBoost,LightGBM,and CatBoost models were established to mine the protein-coding gene markers related to the prognosis of breast invasive cancer and the model performance was compared.The expression data of breast cancer from the Gene Expression Omnibus was used for validation. Results A total of 151 differentially expressed protein-coding genes related to survival prognosis were screened out.The machine learning model established with C3orf80,UGP2,and SPC25 demonstrated the best performance. Conclusions Three protein-coding genes (UGP2,C3orf80,and SPC25) were screened out to identify breast invasive carcinoma.This study provides a new direction for the treatment and diagnosis of breast invasive carcinoma.

Authors

  • Yue Wu
    Key Laboratory of Luminescence and Real-Time Analytical Chemistry (Ministry of Education), College of Pharmaceutical Sciences, Southwest University, Chongqing 400716, China.
  • Kai-Yuan Min
    2 State Key Laboratory of Common Mechanism Research for Major Diseases,Institute of Basic Medical Sciences,CAMS and PUMC,Beijing 100005,China.
  • Jiang-Feng Liu
    State Key Laboratory of Common Mechanism Research for Major Diseases, Department of Biochemistry and Molecular Biology, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences & Peking Union Medical College, No.5, Dongdan 3, Dongcheng District Municipality of Beijing, Beijing 100005, China.
  • Wan-Feng Liang
    1 School of Statistics and Data Science,Nankai University,Tianjin 300071,China.
  • Ye-Hong Yang
    State Key Laboratory of Common Mechanism Research for Major Diseases, Department of Biochemistry and Molecular Biology, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences & Peking Union Medical College, No.5, Dongdan 3, Dongcheng District Municipality of Beijing, Beijing 100005, China.
  • Gang Hu
    Ping An Health Technology, Beijing, China.
  • Jun-Tao Yang
    State Key Laboratory of Common Mechanism Research for Major Diseases, Department of Biochemistry and Molecular Biology, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences & Peking Union Medical College, No.5, Dongdan 3, Dongcheng District Municipality of Beijing, Beijing 100005, China.