Machine Learning for Early Lung Cancer Identification Using Routine Clinical and Laboratory Data.

Journal: American journal of respiratory and critical care medicine
Published Date:

Abstract

Most lung cancers are diagnosed at an advanced stage. Presymptomatic identification of high-risk individuals can prompt earlier intervention and improve long-term outcomes. To develop a model to predict a future diagnosis of lung cancer on the basis of routine clinical and laboratory data by using machine learning. We assembled data from 6,505 case patients with non-small cell lung cancer (NSCLC) and 189,597 contemporaneous control subjects and compared the accuracy of a novel machine learning model with a modified version of the well-validated 2012 Prostate, Lung, Colorectal and Ovarian Cancer Screening Trial risk model (mPLCOm2012), by using the area under the receiver operating characteristic curve (AUC), sensitivity, and diagnostic odds ratio (OR) as measures of model performance. Among ever-smokers in the test set, a machine learning model was more accurate than the mPLCOm2012 for identifying NSCLC 9-12 months before clinical diagnosis ( < 0.00001) and demonstrated an AUC of 0.86, a diagnostic OR of 12.3, and a sensitivity of 40.1% at a predefined specificity of 95%. In comparison, the mPLCOm2012 demonstrated an AUC of 0.79, an OR of 7.4, and a sensitivity of 27.9% at the same specificity. The machine learning model was more accurate than standard eligibility criteria for lung cancer screening and more accurate than the mPLCOm2012 when applied to a screening-eligible population. Influential model variables included known risk factors and novel predictors such as white blood cell and platelet counts. A machine learning model was more accurate for early diagnosis of NSCLC than either standard eligibility criteria for screening or the mPLCOm2012, demonstrating the potential to help prevent lung cancer deaths through early detection.

Authors

  • Michael K Gould
    Department of Health Systems Science, Kaiser Permanente Bernard J. Tyson School of Medicine, Pasadena, CA.
  • Brian Z Huang
    Department of Research and Evaluation, Kaiser Permanente Southern California, Pasadena, CA.
  • Martin C Tammemagi
    Department of Community Health Sciences, Brock University, St. Catharines, Ontario, Canada.
  • Yaron Kinar
    Medial Research, Kfar Malal, Israel.
  • Ron Shiff
    Medial EarlySign, Newton, Massachusetts.