Pan-cancer predictive survival model development and evaluation using electronic health record and genetic data across 10 cancer types.

Journal: Discover oncology
Published Date:

Abstract

The growing burden of cancer and recent surge in healthcare data availability call for new ways of analysing this multifactorial disease and improving patient outcomes. The aim of this study is to develop and evaluate prognostic cancer survival models across ten common cancer types based on a large patient sample. We compare the performance of different machine learning algorithms and assess the added value of genetic information in cancer prognosis. We also provide ways to improve model explainabilty which is critical for model adoption in clinical practice. This study included data from 9977 patients with bladder, breast, colorectal, endometrial, glioma, leukaemia, lung, ovarian, prostate, and renal cancers. Genetic data collected through the 100,000 Genomes Project was linked with clinical and demographic data provided by the National Cancer Registration and Analysis Service, Hospital Episode Statistics and Office for National Statistics. More than 500 prognostic features were assessed and four machine learning algorithms including Elastic Net Cox proportional hazards regression, random survival forest, gradient boosting survival and DeepSurv neural network were developed in this study. Most models achieved good performance varying from 60% in bladder cancer to 80% in glioma with the average C-index of 72% across all cancer types. Different machine learning methods achieved similar performance with DeepSurv model slightly underperforming compared to other methods. Addition of genetic data improved performance in endometrial, glioma, ovarian and prostate cancers, showing its potential importance for cancer prognosis. Patient's age, stage, grade, referral route, waiting times, pre-existing conditions, previous hospital utilisation, tumour mutational burden and mutations in gene TP53 were among the most important features in cancer survival modelling. By offering a comprehensive set of predictive models for cancer survival, this study fills a critical gap in our understanding of cancer prognosis and provides new tools for informing cancer treatment and consequently improving patient outcomes.

Authors

  • Jurgita Gammall
    Institute of Health Informatics, University College London, 222 Euston Road, London, NW1 2DA, UK. jurgita.gammall.20@ucl.ac.uk.
  • Alvina G Lai
    Institute of Health Informatics, University College London, 222 Euston Road, London, NW1 2DA, UK. alvinagracelai@gmail.com.

Keywords

No keywords available for this article.