Predicting time-to-first cancer diagnosis across multiple cancer types.
Journal:
Scientific reports
Published Date:
Jul 9, 2025
Abstract
Cancer causes over 10 million deaths annually worldwide, with 40.5% of Americans expected to be diagnosed in their lifetime. Early detection is critical; for liver cancer, survival rates improve from 4 to 37% when caught early. However, predicting time to first cancer diagnosis is challenging due to its complex and multifactorial nature. We developed predictive models using the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial for training and the UK Biobank for evaluation to estimate time-to-first cancer diagnosis for high-incidence cancers, including lung, liver, and bladder cancers. Utilizing Cox proportional hazards models with elastic net regularization, survival decision trees, and random survival forests, we used 46 sex-agnostic demographic, clinical, and behavioral features. The Cox model achieved a C-index of 0.813 for lung cancer, surpassing non-parametric machine learning methods in accuracy and interpretability. Cancer-specific models consistently outperformed non-specific cancer models, as shown by time-dependent AUC analyses. Scaled Cox coefficients revealed novel insights, including BMI's inverse association with lung cancer risk. Our findings offer interpretable, accurate tools for personalized cancer risk assessment, improving early detection and bridging computational advances with clinical practice.