Ensembles of NLP Tools for Data Element Extraction from Clinical Notes.

Journal: AMIA ... Annual Symposium proceedings. AMIA Symposium
Published Date:

Abstract

Natural Language Processing (NLP) is essential for concept extraction from narrative text in electronic health records (EHR). To extract numerous and diverse concepts, such as data elements (i.e., important concepts related to a certain medical condition), a plausible solution is to combine various NLP tools into an ensemble to improve extraction performance. However, it is unclear to what extent ensembles of popular NLP tools improve the extraction of numerous and diverse concepts. Therefore, we built an NLP ensemble pipeline to synergize the strength of popular NLP tools using seven ensemble methods, and to quantify the improvement in performance achieved by ensembles in the extraction of data elements for three very different cohorts. Evaluation results show that the pipeline can improve the performance of NLP tools, but there is high variability depending on the cohort.

Authors

  • Tsung-Ting Kuo
    University of California San Diego, La Jolla, CA.
  • Pallavi Rao
    University of California, Davis, CA.
  • Cleo Maehara
    University of California, Los Angeles, CA.
  • Son Doan
    University of California San Diego, La Jolla, CA.
  • Juan D Chaparro
    University of California San Diego, La Jolla, CA.
  • Michele E Day
    University of California San Diego, La Jolla, CA.
  • Claudiu Farcas
    University of California San Diego, La Jolla, CA.
  • Lucila Ohno-Machado
    University of California San Diego, La Jolla, CA.
  • Chun-Nan Hsu
    University of California San Diego, La Jolla, CA.