An Exploratory Study on Pseudo-Data Generation in Prescription and Adverse Drug Reaction Extraction.

Journal: Studies in health technology and informatics
Published Date:

Abstract

Prescription information and adverse drug reactions (ADR) are two components of detailed medication instructions that can benefit many aspects of clinical research. Automatic extraction of this information from free-text narratives via Information Extraction (IE) can open it up to downstream uses. IE is commonly tackled by supervised Natural Language Processing (NLP) systems which rely on annotated training data. However, training data generation is manual, time-consuming, and labor-intensive. It is desirable to develop automatic methods for augmenting manually labeled data. We propose pseudo-data generation as one such automatic method. Pseudo-data are synthetic data generated by combining elements of existing labeled data. We propose and evaluate two sets of pseudo-data generation methods: knowledge-driven methods based on gazetteers and data-driven methods based on deep learning. We use the resulting pseudo-data to improve medication and ADR extraction. Data-driven pseudo-data are suitable for concept categories with high semantic regularities and short textual spans. Knowledge-driven pseudo-data are effective for concept categories with longer textual spans, assuming the knowledge base offers good coverage of these concepts. Combining the knowledge- and data-driven pseudo-data achieves significant performance improvement on medication names and ADRs over baselines limited to the use of available labeled data.

Authors

  • Carson Tao
    Department of Information Science, State University of New York at Albany, NY, USA. Electronic address: mtao@albany.edu.
  • Kahyun Lee
    George Mason University, Fairfax, VA, USA.
  • Michele Filannino
    Department of Computer Science, State University of New York at Albany, NY, USA.
  • Ozlem Uzuner
    Department of Information Studies, University at Albany, SUNY. Albany, NY.