A Frame-Based NLP System for Cancer-Related Information Extraction.

Journal: AMIA ... Annual Symposium proceedings. AMIA Symposium
Published Date:

Abstract

We propose a frame-based natural language processing (NLP) method that extracts cancer-related information from clinical narratives. We focus on three frames: cancer diagnosis, cancer therapeutic procedure, and tumor description. We utilize a deep learning-based approach, bidirectional Long Short-term Memory (LSTM) Conditional Random Field (CRF), which uses both character and word embeddings. The system consists of two constituent sequence classifiers: a frame identification (lexical unit) classifier and a frame element classifier. The classifier achieves an F of 93.70 for cancer diagnosis, 96.33 for therapeutic procedure, and 87.18 for tumor description. These represent improvements of 10.72, 0.85, and 8.04 over a baseline heuristic, respectively. Additionally, we demonstrate that the combination of both GloVe and MIMIC-III embeddings has the best representational effect. Overall, this study demonstrates the effectiveness of deep learning methods to extract frame semantic information from clinical narratives.

Authors

  • Yuqi Si
    School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA.
  • Kirk Roberts
    The University of Texas Health Science Center at Houston, USA.