Pseudo-data generation for the extraction of Problems, Treatments and Tests.

Journal: AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science
Published Date:

Abstract

One of the primary challenges for clinical Named Entity Recognition (NER) is the availability of annotated training data. Technical and legal hurdles prevent the creation and release of corpora related to electronic health records (EHRs). In this work, we look at the impact of pseudo-data generation on clinical NER using gazetteering utilizing a neural network model. We report that gazetteers can result in the inclusion of proper terms with the exclusion of determiners and pronouns in preceding and middle positions. Gazetteers that had higher numbers of terms inclusive to the original dataset had a higher impact.

Authors

  • Jeff Smith
    Catapult Health Inc. Dallas TX 75254 USA.
  • Evan French
    Virginia Commonwealth University, Richmond, VA, USA.
  • William Cramer
    Virginia Commonwealth University, Richmond, VA, USA.
  • Ӧzlem Uzuner
    George Mason University, Fairfax, VA, USA.
  • Bridget T McInnes
    Department of Computer Science, Virginia Commonwealth University, 401 S. Main St., Rm E4225, Richmond, VA 23284, USA. Electronic address: btmcinnes@vcu.edu.