Pseudo-data generation for the extraction of Problems, Treatments and Tests.
Journal:
AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science
Published Date:
May 17, 2021
Abstract
One of the primary challenges for clinical Named Entity Recognition (NER) is the availability of annotated training data. Technical and legal hurdles prevent the creation and release of corpora related to electronic health records (EHRs). In this work, we look at the impact of pseudo-data generation on clinical NER using gazetteering utilizing a neural network model. We report that gazetteers can result in the inclusion of proper terms with the exclusion of determiners and pronouns in preceding and middle positions. Gazetteers that had higher numbers of terms inclusive to the original dataset had a higher impact.