Predicting Prostate Cancer Diagnosis Using Machine Learning Analysis of Healthcare Utilization Patterns.

Journal: Studies in health technology and informatics
PMID:

Abstract

This study investigated healthcare utilization patterns prior to prostate cancer diagnoses, aiming to develop machine learning models for early prediction of cancer diagnosis. Data from the All of Us Research Program was used, focusing on adult patients diagnosed with prostate cancer between 2010 and 2019. Key variables were derived from procedure, measurements, and condition records, including PSA values, comorbidity index, and symptoms. Multiple machine learning models were tested to predict prostate cancer 3, 6, 9, and 12 months ahead of time. The dataset included 1,276 cancer patients and 1,232 non-cancer patients. The XGBoost model performed best at 3 months, achieving an accuracy and F1 score of 0.73 and an AUC of 0.82. At 6 months, the model had an accuracy and F1 score of 0.71 and an AUC of 0.78. Performance declined with longer prediction windows. PSA values were consistently the most important predictor across all timeframes, along with other factors like triglyceride and creatinine levels.

Authors

  • Wanting Cui
    Icahn School of Medicine at Mount Sinai, New York, NY, USA.
  • Ahmad Halwani
  • Chunyang Li
    West China Hospital of Sichuan University, China.
  • Joseph Finkelstein
    Department of Biomedical Informatics, School of Medicine, University of Utah, USA.