Identification of patients at risk for pancreatic cancer in a 3-year timeframe based on machine learning algorithms.

Journal: Scientific reports
PMID:

Abstract

Early detection of pancreatic cancer (PC) remains challenging largely due to the low population incidence and few known risk factors. However, screening in at-risk populations and detection of early cancer has the potential to significantly alter survival. In this study, we aim to develop a predictive model to identify patients at risk for developing new-onset PC at two and a half to three year time frame. We used the Electronic Health Records (EHR) of a large medical system from 2000 to 2021 (N = 537,410). The EHR data analyzed in this work consists of patients' demographic information, diagnosis records, and lab values, which are used to identify patients who were diagnosed with pancreatic cancer and the risk factors used in the machine learning algorithm for prediction. We identified 73 risk factors of pancreatic cancer with the Phenome-wide Association Study (PheWAS) on a matched case-control cohort. Based on them, we built a large-scale machine learning algorithm based on EHR. A temporally stratified validation based on patients not included in any stage of the training of the model was performed. This model showed an AUROC at 0.742 [0.727, 0.757] which was similar in both the general population and in a subset of the population who has had prior cross-sectional imaging. The rate of diagnosis of pancreatic cancer in those in the top 1 percentile of the risk score was 6 folds higher than the general population. Our model leverages data extracted from a 6-month window of time in the electronic health record to identify patients at nearly sixfold higher than baseline risk of developing pancreatic cancer 2.5-3 years from evaluation. This approach offers an opportunity to define an enriched population entirely based on static data, where current screening may be recommended.

Authors

  • Weicheng Zhu
    Center for Data Science, NYU, 60 Fifth Avenue, 5th Floor, New York, NY, 10011, USA.
  • Long Chen
    Department of Critical Care Medicine, The First Affiliated Hospital, Fujian Medical University, Fuzhou, China.
  • Yindalon Aphinyanaphongs
    Department of Population Health, New York University, New York.
  • Fay Kastrinos
    Department of Medicine, Division of Digestive and Liver Diseases, Columbia University Irving Medical Center, New York, NY, USA.
  • Diane M Simeone
    Moores Cancer Center, UC San Diego Health, San Diego, CA, USA.
  • Mark Pochapin
    Division of Gastroenterology and Hepatology, Department of Medicine, New York University, 240 East 38th Street, 23rd Floor, New York, NY, 10016, USA.
  • Cody Stender
    Department of Surgery, New York University, New York, NY, USA.
  • Narges Razavian
    1 Department of Computer Science, New York University , New York, New York.
  • Tamas A Gonda
    Division of Gastroenterology and Hepatology, Department of Medicine, New York University, 240 East 38th Street, 23rd Floor, New York, NY, 10016, USA. Tamas.Gonda@nyulangone.org.