Using natural language processing to construct a metastatic breast cancer cohort from linked cancer registry and electronic medical records data.

Journal: JAMIA open
Published Date:

Abstract

OBJECTIVES: Most population-based cancer databases lack information on metastatic recurrence. Electronic medical records (EMR) and cancer registries contain complementary information on cancer diagnosis, treatment and outcome, yet are rarely used synergistically. To construct a cohort of metastatic breast cancer (MBC) patients, we applied natural language processing techniques within a semisupervised machine learning framework to linked EMR-California Cancer Registry (CCR) data.

Authors

  • Albee Y Ling
    Biomedical Informatics Training Program, Stanford University, Stanford, CA.
  • Allison W Kurian
    Department of Medicine, Stanford University School of Medicine, Stanford, CA.
  • Jennifer L Caswell-Jin
    Department of Medicine, Stanford University School of Medicine, Stanford, CA.
  • George W Sledge
    Department of Medicine, Stanford University School of Medicine, Stanford, CA.
  • Nigam H Shah
    Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, CA, USA.
  • Suzanne R Tamang
    Department of Biomedical Data Science, Stanford University, Stanford, CA.

Keywords

No keywords available for this article.