Machine Learning analysis of high-grade serous ovarian cancer proteomic dataset reveals novel candidate biomarkers.

Journal: Scientific reports
PMID:

Abstract

Ovarian cancer is one of the most common gynecological malignancies, ranking third after cervical and uterine cancer. High-grade serous ovarian cancer (HGSOC) is one of the most aggressive subtype, and the late onset of its symptoms leads in most cases to an unfavourable prognosis. Current predictive algorithms used to estimate the risk of having Ovarian Cancer fail to provide sufficient sensitivity and specificity to be used widely in clinical practice. The use of additional biomarkers or parameters such as age or menopausal status to overcome these issues showed only weak improvements. It is necessary to identify novel molecular signatures and the development of new predictive algorithms able to support the diagnosis of HGSOC, and at the same time, deepen the understanding of this elusive disease, with the final goal of improving patient survival. Here, we apply a Machine Learning-based pipeline to an open-source HGSOC Proteomic dataset to develop a decision support system (DSS) that displayed high discerning ability on a dataset of HGSOC biopsies. The proposed DSS consists of a double-step feature selection and a decision tree, with the resulting output consisting of a combination of three highly discriminating proteins: TOP1, PDIA4, and OGN, that could be of interest for further clinical and experimental validation. Furthermore, we took advantage of the ranked list of proteins generated during the feature selection steps to perform a pathway analysis to provide a snapshot of the main deregulated pathways of HGSOC. The datasets used for this study are available in the Clinical Proteomic Tumor Analysis Consortium (CPTAC) data portal ( https://cptac-data-portal.georgetown.edu/ ).

Authors

  • Federica Farinella
    Division of Clinical Pathology, Laboratori Vita s.r.l., Via Sabaudia 19, 04100, Latina, Italy.
  • Mario Merone
    Unit of Computer Systems an Bioinformatics, Department of Engineering, Università Campus Bio-Medico di Roma, Via Alvaro del Portillo 21, 00128, Rome, Italy. m.merone@unicampus.it.
  • Luca Bacco
    Unit of Computer Systems an Bioinformatics, Department of Engineering, Università Campus Bio-Medico di Roma, Via Alvaro del Portillo 21, 00128, Rome, Italy.
  • Adriano Capirchio
    Computational and Translational Neuroscience Laboratory, Institute of Cognitive Sciences and Technologies, National Research Council (CTN-ISTC-CNR), Via San Martino della Battaglia 44, 00185, Rome, Italy.
  • Massimo Ciccozzi
    Unit of Medical Statistic and Epidemiology, Università Campus Bio-Medico di Roma, Via Alvaro del Portillo, 21, 00128, Rome, Italy.
  • Daniele Caligiore
    Laboratory of Computational Embodied Neuroscience,Institute of Cognitive Sciences and Technologies,National Research Council of Italy,Rome,Italy.gianluca.baldassarre@istc.cnr.itvieri.santucci@istc.cnr.itemilio.cartoni@istc.cnr.itdaniele.caligiore@istc.cnr.ithttp://www.istc.cnr.it/people/http://www.istc.cnr.it/people/gianluca-baldassarrehttp://www.istc.cnr.it/people/vieri-giuliano-santuccihttp://www.istc.cnr.it/people/emilio-cartonihttp://www.istc.cnr.it/people/daniele-caligiore.