Automated real-world data integration improves cancer outcome prediction.

Journal: Nature
PMID:

Abstract

The digitization of health records and growing availability of tumour DNA sequencing provide an opportunity to study the determinants of cancer outcomes with unprecedented richness. Patient data are often stored in unstructured text and siloed datasets. Here we combine natural language processing annotations with structured medication, patient-reported demographic, tumour registry and tumour genomic data from 24,950 patients at Memorial Sloan Kettering Cancer Center to generate a clinicogenomic, harmonized oncologic real-world dataset (MSK-CHORD). MSK-CHORD includes data for non-small-cell lung (n = 7,809), breast (n = 5,368), colorectal (n = 5,543), prostate (n = 3,211) and pancreatic (n = 3,109) cancers and enables discovery of clinicogenomic relationships not apparent in smaller datasets. Leveraging MSK-CHORD to train machine learning models to predict overall survival, we find that models including features derived from natural language processing, such as sites of disease, outperform those based on genomic data or stage alone as tested by cross-validation and an external, multi-institution dataset. By annotating 705,241 radiology reports, MSK-CHORD also uncovers predictors of metastasis to specific organ sites, including a relationship between SETD2 mutation and lower metastatic potential in immunotherapy-treated lung adenocarcinoma corroborated in independent datasets. We demonstrate the feasibility of automated annotation from unstructured notes and its utility in predicting patient outcomes. The resulting data are provided as a public resource for real-world oncologic research.

Authors

  • Justin Jee
    Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA. Electronic address: jeej@mskcc.org.
  • Christopher Fong
    Memorial Sloan Kettering Cancer Center, 1275 York Ave, New York, NY, USA.
  • Karl Pichotta
    From the Departments of Radiology (N.C.S., V.Y., Y.R.C., D.C.G., J.T., V.H., S.S.H., S.K., J.L., K.J., A.I.H., R.J.Y.), Radiation Oncology (J.T.Y.), Neurosurgery (N.M.), Neurology (J.S.), and Epidemiology and Biostatistics, Division of Computational Oncology, (K.P., J.G., S.P.S.), Memorial Sloan Kettering Cancer Center, 1275 York Ave, New York, NY 10065; Weill Cornell Medical College, New York, NY (J.K.).
  • Thinh Ngoc Tran
    Memorial Sloan Kettering Cancer Center, New York, NY, USA.
  • Anisha Luthra
    From the Department of Radiology (R.K.G.D., P.I.C.A., M.T., N.G., K.J., H.H.), Human Pathology and Pathogenesis Program, Center for Molecular Oncology (A.L.), Department of Strategy and Innovation (H.N., P.R., L.G., K.N.), and Biostatistics Service, Department of Epidemiology and Biostatistics (C.J.F., N.S., V.S.), Memorial Sloan Kettering Cancer Center, 1275 York Ave, New York, NY 10065; and School of Computing, Queens University, Kingston, Canada (K.L., K.B., F.Z., A.S.).
  • Michele Waters
    Memorial Sloan Kettering Cancer Center, 1275 York Ave, New York, NY, USA.
  • Chenlian Fu
    Memorial Sloan Kettering Cancer Center, New York, NY, USA.
  • Mirella Altoe
    Memorial Sloan Kettering Cancer Center, New York, NY, USA.
  • Si-Yang Liu
    Memorial Sloan Kettering Cancer Center, New York, NY, USA.
  • Steven B Maron
    Memorial Sloan Kettering Cancer Center, New York, NY.
  • Mehnaj Ahmed
    Memorial Sloan Kettering Cancer Center, New York, NY, USA.
  • Susie Kim
    Memorial Sloan Kettering Cancer Center, New York, NY, USA.
  • Mono Pirun
    Memorial Sloan Kettering Cancer Center, New York, NY, USA.
  • Walid K Chatila
    Marie-Josée & Henry R. Kravis Center for Molecular Oncology, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA.
  • Ino de Bruijn
    Memorial Sloan Kettering Cancer Center, New York, NY, USA.
  • Arfath Pasha
    Memorial Sloan Kettering Cancer Center, New York, NY, USA.
  • Ritika Kundra
    Memorial Sloan Kettering Cancer Center, New York, NY, USA.
  • Benjamin Gross
    Memorial Sloan Kettering Cancer Center, New York, NY, USA.
  • Brooke Mastrogiacomo
    Memorial Sloan Kettering Cancer Center, New York, NY, USA.
  • Tyler J Aprati
    Dana Farber Cancer Institute, Boston, MA, USA.
  • David Liu
    NASA Jet Propulsion Laboratory, Pasadena, CA.
  • Jianjiong Gao
    From the Departments of Radiology (N.C.S., V.Y., Y.R.C., D.C.G., J.T., V.H., S.S.H., S.K., J.L., K.J., A.I.H., R.J.Y.), Radiation Oncology (J.T.Y.), Neurosurgery (N.M.), Neurology (J.S.), and Epidemiology and Biostatistics, Division of Computational Oncology, (K.P., J.G., S.P.S.), Memorial Sloan Kettering Cancer Center, 1275 York Ave, New York, NY 10065; Weill Cornell Medical College, New York, NY (J.K.).
  • Marzia Capelletti
    Caris Life Sciences, Irving, TX, USA.
  • Kelly Pekala
    Memorial Sloan Kettering Cancer Center, New York, NY, USA.
  • Lisa Loudon
    Memorial Sloan Kettering Cancer Center, New York, NY, USA.
  • Maria Perry
    Memorial Sloan Kettering Cancer Center, New York, NY, USA.
  • Chaitanya Bandlamudi
    Memorial Sloan Kettering Cancer Center, New York, NY, USA.
  • Mark Donoghue
    Memorial Sloan Kettering Cancer Center, New York, NY, USA.
  • Baby Anusha Satravada
    Memorial Sloan Kettering Cancer Center, New York, NY, USA.
  • Axel Martin
    Biostatistics Service, Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY, USA.
  • Ronglai Shen
    Biostatistics Service, Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY, USA.
  • Yuan Chen
    Department of Biostatistics, Mailman School of Public Health, Columbia University, New York, NY 10032.
  • A Rose Brannon
    Memorial Sloan Kettering Cancer Center, New York, NY, USA.
  • Jason Chang
    UCLA David Geffen School of Medicine, Los Angeles, CA, United States of America.
  • Lior Braunstein
    Department of Radiation Oncology, Memorial Sloan Kettering Cancer Center, New York, NY, USA.
  • Anyi Li
    Department of Medical Physics, Memorial Sloan-Kettering Cancer Center, New York, USA.
  • Anton Safonov
    Memorial Sloan Kettering Cancer Center, New York, NY, USA.
  • Aaron Stonestrom
    Memorial Sloan Kettering Cancer Center, New York, NY, USA.
  • Pablo Sanchez-Vela
    Memorial Sloan Kettering Cancer Center, New York, NY, USA.
  • Clare Wilhelm
    Memorial Sloan Kettering Cancer Center, New York, NY, USA.
  • Mark Robson
    Memorial Sloan Kettering Cancer Center, New York, NY, USA.
  • Howard Scher
    Memorial Sloan Kettering Cancer Center, New York, NY, USA.
  • Marc Ladanyi
    Human Oncology and Pathogenesis Program, Memorial Sloan Kettering Cancer Center, New York, New York.
  • Jorge S Reis-Filho
    Department of Pathology, Memorial Sloan Kettering Cancer Center, New York, NY, USA.
  • David B Solit
    Human Oncology and Pathogenesis Program, Memorial Sloan Kettering Cancer Center, New York, New York.
  • David R Jones
    Thoracic Service, Department of Surgery, Memorial Sloan Kettering Cancer Center, New York, NY.
  • Daniel Gomez
    Department of Radiation Oncology, MD Anderson Cancer Center, Houston, Texas, USA.
  • Helena Yu
    Memorial Sloan Kettering Cancer Center, New York, NY, USA.
  • Debyani Chakravarty
    Kravis Center of Molecular Oncology, Department of Pathology, Memorial Sloan Kettering Cancer Center, New York, NY.
  • Rona Yaeger
    Department of Medicine, Memorial Sloan Kettering Cancer Center, New York, NY.
  • Wassim Abida
    Memorial Sloan Kettering Cancer Center, New York, NY, USA.
  • Wungki Park
    Memorial Sloan Kettering Cancer Center, New York, NY, USA.
  • Eileen M O'Reilly
    Memorial Sloan Kettering Cancer Center, New York, NY, USA.
  • Julio Garcia-Aguilar
    Department of Surgery, Memorial Sloan Kettering Cancer Center, New York, New York.
  • Nicholas Socci
    Memorial Sloan Kettering Cancer Center, New York, NY, USA.
  • Francisco Sanchez-Vega
    Marie-Josée & Henry R. Kravis Center for Molecular Oncology, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA.
  • Jian Carrot-Zhang
    Memorial Sloan Kettering Cancer Center, New York, NY, USA.
  • Peter D Stetson
    Department of Health Informatics, Memorial Sloan Kettering Cancer Center, New York, NY.
  • Ross Levine
    Memorial Sloan Kettering Cancer Center, New York, NY, USA.
  • Charles M Rudin
    Memorial Sloan Kettering Cancer Center, New York, NY, USA.
  • Michael F Berger
    Human Oncology and Pathogenesis Program, Memorial Sloan Kettering Cancer Center, New York, New York.
  • Sohrab P Shah
    From the Departments of Radiology (N.C.S., V.Y., Y.R.C., D.C.G., J.T., V.H., S.S.H., S.K., J.L., K.J., A.I.H., R.J.Y.), Radiation Oncology (J.T.Y.), Neurosurgery (N.M.), Neurology (J.S.), and Epidemiology and Biostatistics, Division of Computational Oncology, (K.P., J.G., S.P.S.), Memorial Sloan Kettering Cancer Center, 1275 York Ave, New York, NY 10065; Weill Cornell Medical College, New York, NY (J.K.).
  • Deborah Schrag
    Memorial-Sloan Kettering Cancer Center, New York, USA.
  • Pedram Razavi
    Department of Medicine, Memorial Sloan Kettering Cancer Center, New York, New York.
  • Kenneth L Kehl
    Department of Medicine, Dana-Farber Cancer Institute, Boston, MA, 02215, United States.
  • Bob T Li
    Memorial Sloan Kettering Cancer Center, New York, NY, USA.
  • Gregory J Riely
    Thoracic Oncology Service, Memorial Sloan Kettering Cancer Center, New York, New York.
  • Nikolaus Schultz
    Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA.